RED method for analyzing MySQL performance

The translation of the article was prepared in anticipation of the start of the course "DevOps Practices and Tools" .










The RED (Rate, Errors, Duration) method is one of the popular approaches to performance monitoring. It is often used for monitoring microservices , although nothing prevents you from using it for databases such as MySQL.



In Percona Monitoring and Management (PMM) v2, all the necessary information is collected into the ClickHouse database, and then it is a matter of technology to create a dashboard for visualizing metrics using the built-in ClickHouse data source .



When creating the dashboard, in addition to panels for RED, several additional panels were added to show some interesting things that can be done with Grafana + ClickHouse as a data source and information that we store about MySQL query performance.



Let's take a closer look at the dashboard.







We see the classic RED panels showing the Query Rate (requests per second), Error Rate (errors), as well as the average and 99th percentile Query Latency (query execution time) for all nodes in the system. The panels below display information on specific nodes, which is very useful for comparing their performance. If one of the nodes starts to work differently from other similar nodes, then this is a reason for investigation.



With the help of filters (“Filters” at the top of the dashboard), you can view only the data you need. For example, you can select only "sbtest" schema queries for hosts located in the "datacenter4" region:







This ad-hoc filtering is very convenient. You can use regular expressions in filters, search by a specific QueryID, analyze queries from specific client hosts, etc. For a description of the columns available in ClickHouse, see the post Advanced Query Analysis in Percona Monitoring and Management with Direct ClickHouse Access .



From most of the panels, you can quickly go to Query Analytics to view detailed information about query performance, or if you notice something unusual on one of the hosts, you can view the queries of this host through the “Data Links” - click on graph and follow the dedicated link:







For each of the systems separately, you can look at the same RED metrics as for the entire system as a whole. By default, I would leave these panels minimized, especially if you are monitoring many hosts.



We got acquainted with the panels of the RED method. Now let's take a look at the Additional Dashboards in this dashboard.







Row Based Efficiency shows how many rows were analyzed for each row returned or changed. Typically, values ​​greater than 100 indicate bad indexes or very complex queries that read a lot of data and only return a few rows. Both of these cases require analysis.



Time-Based Efficiency(time-based efficiency) is based on the same math, but looks at the query execution time rather than the number of rows scanned. This allows you to diagnose problems with a slow disk or conflicting requests. Typically, a high-performance system should expect fractions of a millisecond to send or change a string to the client. Queries that return or update many rows will have a lower value.



Queries Per Host(the number of requests by host) speaks for itself and next to it it is very useful to see the Query Load Per Host (load by hosts), which shows the number of concurrent active requests. Here we can see that despite the fact that mysql4 does not have the highest number of queries (query rate), it has the highest load and the highest average number of active queries.



Thinking about what other metrics might be useful, I added the following additional panels:







These panels separate Query Processing Efficiency into READ queries (which return rows) and WRITE queries (which have row_affected).



QueryTime Based Efficiency is the same as described above, only with an emphasis on certain types of queries.



Data crunching efficiency(data processing efficiency) is a slightly different look at the same data. This shows how many rows are examined by the query versus the execution time of the query. This, on the one hand, shows the processing power of the system. A system with a large number of cores, having all the data in memory, can process millions of rows per second and do a lot of work. But that doesn't mean query efficiency. In fact, systems that process a lot of data quickly often perform many full table scans.



Finally, there are several lists with requests.







Frequent queries, slowest queries (by average execution time), queries with the highest load, and queries that failed or failed. You can also see these queries in Query Analytics, but I wanted to show them here as an example.



Are you interested? You can install the dashboard in Percona Monitoring and Management (PMM) v2 from Grafana.com .






From code to kubernetes







All Articles