How a Read Query Can Write to Disk: a Postgresql Story

By Sam Bleckley on 17 12 2015

Here's a funny relational database story from earlier this year.

One of our clients had a medium-large database — millions of rows — and their business logic required some pretty fancy queries across that data: more sophisticated and more costly than just fetching records by id. So ended up doing quite a bit of query tuning to get everything running smoothly.

This story begins with the discovery of two obviously (ahem) unrelated problems:

  1. Some SELECT operations were appallingly slow — but EXPLAIN suggested that they were using sensible indexes, fast sort strategies, and reasonable LIMITs.

  2. We were running on Amazon RDS, and using up our allotted IO operations far faster than our back-of-the-envelope calculations predicted. Particularly, there were more writes than we expected. Writes have a major impact on the overall latency of the system, so this was a concern.

One of these problems dealt strictly with reads, and one dealt strictly with writes... right?

We spent some time scratching our heads but all was revealed when we ran a full EXPLAIN ANALYZE on the select query showed a strange sort method — something like:

Sort Method: external [...] Disk: 27421kB

Even though the query was LIMITed to return just a few records, they had already been selected based on several indices -- and so the ORDER clause required handling a considerable chunk of the appropriate index. So much so, in fact, that we ran out of our connection's allotted working memory, and started writing sorted chunks to disk.

Wait, our select query was writing to disk?

Ahah!

If you're suffering in a similar situation, the problem is the low default value of work_mem. From the postgres docs:

[work_mem] specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. The value defaults to four megabytes (4MB). Note that for a complex query, several sort or hash operations might be running in parallel; each operation will be allowed to use as much memory as this value specifies before it starts to write data into temporary files. Also, several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries.

The solution, if you're using RDS, is to head to your parameter groups and adjust your work_mem to be larger than the disk usage seen in the EXPLAIN ANALYZE analysis.

Increasing work_mem is a tradeoff -- increasing it is more expensive in terms of the memory it consumes — but you'll be able to use fast, in-memory sorts, and your read queries will avoid high-impact writes.