The Importance of Analyzing and Optimizing your SQL Queries

I’m sure many of you have heard the importance of optimizing your SQL queries for the optimizer. You may have doubted this, maybe you thought your query was too simple. Well, this post is going to show you exactly why query optimization is so important.

I’ll start with the graph:

For statistical analysis my solver periodically gathers statistics about the state of the database. The graph above shows the number of boards by status. “New” boards have not been processed yet, while “Done” boards have.

As you can see over the initial few days it generated about 2.8 million boards. After I optimized only one of my queries and in the past 18 hours have generated over 18 million boards.

I’m sure you remember the query we fixed from last week, that’s the query I optimized. Let’s look at it again:

SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id <= ?
ORDER BY id DESC LIMIT 1

Looking at the status of the server showed the database engine was pegging the CPU. Let’s figure out why that is!

MariaDB has a handy tool, SHOW PROCESSLIST to view what the server is processing at the moment. While running with the old query, that query showed up every time I tried it, so I decided to investigate the query. The ANALYZE tool was made to help investigate query performance:

ANALYZE
SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id <= 14537478
ORDER BY id DESC LIMIT 1

FieldValue
id1
select_typeSIMPLE
tableboards
typeref
possible_keysPRIMARY, status, id_status
keystatus
key_len1
refconst
rows11,369,300
r_rows8,014,010.00
filtered100.00
r_filtered0.00
ExtraUsing where

The analyze output indicates even though we’re using the id_status key values in our where clause, it still uses the key created on status. This is problematic, because it means that even though we should be able to use the key built exactly to optimize this query, we ended up reading over 8 million rows to get the table result. That’s a huge cost to grab a single row.

So why are we doing this? Turns out, MariaDB doesn’t support DESC order keys. So that ORDER BY id DESC is causing us to not use the Primary Key, and we end up going to the status key, and then with those results we search for the id closest to the target.

That means to get the benefit of the Primary Key we need to invert the query:

SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id >= ?
ORDER BY id ASC LIMIT 1

Here’s the analyze result for that:

FieldValue
id1
select_typeSIMPLE
tableboards
typeref
possible_keysPRIMARY, status, id_status
keystatus
key_len1
refconst
rows11,590,310
r_rows1
filtered100.00
r_filtered100.00
ExtraUsing index condition; Using where

So we’re reading only 1 row, which is good, but we’re still using the status key. And looking at the behavior of the server, we’re still pegging the CPU. Something is still wrong, the database engine isn’t using the right index. Well, MariaDB gives us a way to give the database engine a hint. The result is this query:

SELECT id, tiles
FROM boards
USE INDEX (id_status)
WHERE status = 'NEW' AND id >= 14537478
ORDER BY id ASC LIMIT 1

With this analyze:

FieldValue
id1
select_typeSIMPLE
tableboards
typerange
possible_keysid_status
keyid_status
key_len9
ref(NULL)
rows11,626,588
r_rows1
filtered100.00
r_filtered100.00
ExtraUsing index condition

Finally we’re using the right index, and that means some other interesting things happen. For instance, we stop using the WHERE clause and are really just using the index. The query is much faster, even the ANALYZE finished immediately while the prior ANALYZE instances took over a minute. Now after running that query, the server’s CPU load has fallen to below 10%. This means we should be able to scale out the runner a bit better, at least, until we start hitting our next bottleneck.

Tile Set Re-completed, Error Found

Good news everyone!

Yesterday I finished rebuilding the Eternity II tile set, and today I determined the source of the anomalies in the growth of generated boards.

Rebuilding the tile set just took a bunch of time to sit down and go through the physical tile pieces and convert them to a CSV file. I won’t bore you with those details.

More interesting is the programming error. Let’s see what was corrupting my data! Let’s start with this handy SQL statement:

SELECT max(id) AS id, tiles
FROM boards
WHERE status = 'NEW' AND id <= ?
LIMIT 1

What do you think this query accomplishes? You can probably determine the intent. I was trying to get the maximum tile id below some limit, and some data stored with it. My runner randomly generates the limit value, hopefully giving a good mix of rows from different parts of the computation.

However, there is an interesting thing that happens with this query. It returns the correct id value, but it returns the data portion from an arbitrary row. It returned whatever the database engine happens to read first that meets the WHERE clause. This means I was processing the same board configuration multiple times, but skipping the ones identified by the id. This would cause both a lot of wasted calculations, and a lot of missed configurations.

Oof, no fun at all. A restart was really required.

After some digging and experimentation, the corrected query looks like this:

SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id <= ?
ORDER BY id DESC LIMIT 1

Moving the “max” calculation from the fields section of the query into the ORDER BY clause means the query will return the paired row id and data together.

Now that I have corrected the error, on to processing board configurations!

Data Corruption – Resetting Eternity II Progress

Disaster has struck!

I was investigating anomalies in the growth of generated boards and counted the occurrences of each side in my tile configuration. This allowed me to determine at what scale the growth should be at each level.

Unfortunately, doing so revealed my tile set is corrupt! Two of the sides occur an odd number of times in my data set!

Because this is a tile matching problem, the number of occurrences of each side must be even. This is apparent because each side must match with another side of the same type, requiring two same sides for each edge. Long story short, if any side appears an odd number of times, I can never complete the puzzle.

This corrupts my entire generated board state so far and requires me to reset and with a regenerated tile set. Boo.

I think something else is wrong with the generation algorithm and need to investigate that. For now, I’m just going to go through the process of regenerating my tile set.