Performance debugging basics: the MySQL slow log

In this blog post I’ll show you how simple and usefull the MySQL slow query log can be to find performance problems in a Drupal 7 site running a complex import task as a drush batch process.

Look at the graph below, showing a Zabbix graph of the CPU utilization of a staging server for one of our projects:

Zabbix CPU utilization graph during a performance problem

On the left side, between midnight and 6:30, you can see a lot of colors in the graph. The green one is idle time, the blue one is a PHP application doing some work but the dark yellow and purple ones are bad ones: they indicate that there’s way too much disk I/O going on while running the PHP application.

As this server has no New Relic APM Pro account, doing a deep code inspection wasn’t possible. I did however see that most of the processing time was spent on MySQL, as you can see in the standard (free) New Relic APM graph below:

New Relic APM graph of a drush command

The first and most basic thing you then have to do is turn on the MySQL slow log on the server. The slow log is useful to log 2 things:

In MySQL you simply have to add these statements in your my.cnf file. This example config is for a MariaDB 5.5 server, if you have a different version or a different fork of MySQL, check the documentation for the proper slow log configuration as it might be different:

# Slow logging
slow_query_log = OFF
log_slow_queries = /var/log/mysql/slow.log
long_query_time = 2
log-queries-not-using-indexes

In this example config I’ve set the query cut-off time to 2 seconds, so anything that takes longer than 2 seconds to execute gets logged. Make sure you create the directory (if it doesn’t exist yet) and touch an empty slow.log file:

# Create the directory
mkdir -p /var/log/mysql
chown mysql:mysql /var/log/mysql

# Create an empty log file
touch /var/log/mysql/slow.log
chown mysql:mysql /var/log/mysql/slow.log

Restart your MySQL server to load the configuration changes.

This config does not turn on slow logging yet, it just configures the log location and sets the options. To actually enable it, you need to run this command in a MySQL prompt with admin privileges.

The example below uses a Plesk server (which we use for our shared development enviroment), but it could be as simple as running “mysql” as root for your system:

[root@acc01 ~]# mysql -uadmin -p`cat /etc/psa/.psa.shadow`
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 2849
Server version: 5.5.42-MariaDB-log MariaDB Server
Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [mysql]> SET GLOBAL slow_query_log = 'ON';

We don’t enable this by default for the server as the log can fill up quite fast. So do not forget to turn it back off when you are doing by running the same command with the value OFF again:

SET GLOBAL slow_query_log = 'OFF'

The benefit of using this command is that you can run it on the server without having to reload or restart it. So once the configuration is setup, enabling or disabling the slow can be done with just this simple command.

Right, now we have our slow log turned on, let’s see what a tail -f /var/log/mysql/slow.log now shows while our PHP application runs. I’ll filter out all the useless queries (and there probably will be a lot) and go straight to the performance killer:

# Time: 150625 16:46:17
# User@Host: user[user] @ localhost []
# Thread_id: 16 Schema: db_prod QC_hit: No
# Query_time: 1.522648 Lock_time: 0.000053 Rows_sent: 0 Rows_examined: 200343
SET timestamp=1435243577;
UPDATE user_member_status SET is_invoiced=0;

This query would show up every 2 seconds, meaning the PHP script probably calls it every time it starts a new batch of things to process. MySQL logs this query because it doesn’t use an index and therefor has to scan all the rows in the table, which leads to a load of blocks being loaded from disk and is a thing you want to avoid at all cost in databases. There are just over 200 000 rows and the query takes 1.522 seconds to run, which is a very long time for a query.

If you look at the query, you can see that it really doesn’t make sense to run this query over and over again. It clearly sets the value for the is_invoiced column to 0 for all the rows in the database, running it more than once is useless.

It could also be missing a where clause, but that’s something only the developer that wrote the code can tell.

So, with this information I went back to the developer, who then confirmed it was a bug: this query should only be run at the start of the process. He made a fix, pushed it to git, made a new release with our Bamboo deployment server, pushed it to acc01 and within 15 minutes of my “discovery” the performance fix was live.

If we look back at the Zabbix graph from above, we can see that at the right side there is now again a lot of CPU action, but only good things. The disk I/O bottleneck is gone as we removed the “bad” query locking up the server:

Bad query locking up the server

That’s it, another performance problem found and fixed using some basic devops tools. I hoped you learned something for this blog and remember to turn off your slow log when you are done!

comments powered by Disqus