Index Your Foreign Keys

By Ross Hunter on 12 06 2014

We add indexes to our database in order to speed up read operations. Basically, an index is a sorted copy of a particular set of columns the database uses to quickly look up the attributes of another table, most commonly, foreign keys of relations. There are several ways this can actually be implemented on the database; essentially, we tell the database which values we'll be looking up most often and it sorts the logical and physical storage of the information to make those operations faster.

One downside to this process is that it takes more physical space on disk because we're keeping extra copies of data in our database. Another drawback is that because we're keeping these indexes up to date, we also have extra writes whenever that data changes.

I would advise you to index all of your foreign keys, unless you have good reasons not to. Also, if you're already down with indexes, don't produce any example migration code that doesn't include index: true on your foreign keys. I think I was late to the database index game because not enough people are preaching it.

To add indexes to your database, create a migration.

You can use this script to find missing indexes in your application.

Here are some performance numbers from an application I have been working on, before and after adding database indexes.

Before

1)
EXPLAIN ANALYZE SELECT "cards".* FROM "cards" WHERE "cards"."expansion_id" = 130 ORDER BY name;
-------------------------------------------------------------------------------
Sort (cost=1367.55..1368.17 rows=248 width=348) (actual time=12.634..12.654 rows=249 loops=1)
Sort Key: name
Sort Method: quicksort Memory: 148kB
-> Seq Scan on cards (cost=0.00..1357.69 rows=248 width=348) (actual time=0.195..9.857 rows=249 loops=1)
Filter: (expansion_id = 130)
Rows Removed by Filter: 22526
Total runtime: 12.837 ms
(6 rows)

2)
EXPLAIN ANALYZE SELECT { //BUNCH O' STUFF// } FROM "transactions" INNER JOIN "cards" ON "cards"."id" = "transactions"."card_id" WHERE "transactions"."trade_id" = 19 AND "transactions"."giver_id" = 1 ORDER BY cards.name DESC;
-------------------------------------------------------------------------------
Sort (cost=9.39..9.39 rows=1 width=376) (actual time=0.113..0.113 rows=2 loops=1)
Sort Key: cards.name
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.29..9.38 rows=1 width=376) (actual time=0.047..0.056 rows=2 loops=1)
-> Seq Scan on transactions (cost=0.00..1.06 rows=1 width=28) (actual time=0.016..0.016 rows=2 loops=1)
Filter: ((trade_id = 19) AND (giver_id = 1))
Rows Removed by Filter: 2
-> Index Scan using cards_pkey on cards (cost=0.29..8.31 rows=1 width=348) (actual time=0.015..0.017 rows=1 loops=2)
Index Cond: (id = transactions.card_id)
Total runtime: 0.226 ms
(10 rows)

AFTER

1)
EXPLAIN ANALYZE SELECT "cards".* FROM "cards" WHERE "cards"."expansion_id" = 130 ORDER BY name;
-------------------------------------------------------------------------------
Sort (cost=461.97..462.59 rows=248 width=348) (actual time=2.871..2.894 rows=249 loops=1)
Sort Key: name
Sort Method: quicksort Memory: 148kB
-> Index Scan using index_cards_on_expansion_id on cards (cost=0.29..452.11 rows=248 width=348) (actual time=0.018..0.221 rows=249 loops=1)
Index Cond: (expansion_id = 130)
Total runtime: 2.944 ms
(6 rows)

2)
EXPLAIN ANALYZE SELECT { //BUNCH O' STUFF// } FROM "transactions" INNER JOIN "cards" ON "cards"."id" = "transactions"."card_id" WHERE "transactions"."trade_id" = 19 AND "transactions"."giver_id" = 1 ORDER BY cards.name DESC;
-------------------------------------------------------------------------------
Sort (cost=9.39..9.39 rows=1 width=376) (actual time=0.056..0.056 rows=2 loops=1)
Sort Key: cards.name
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.29..9.38 rows=1 width=376) (actual time=0.033..0.042 rows=2 loops=1)
-> Seq Scan on transactions (cost=0.00..1.06 rows=1 width=28) (actual time=0.014..0.016 rows=2 loops=1)
Filter: ((trade_id = 19) AND (giver_id = 1))
Rows Removed by Filter: 2
-> Index Scan using cards_pkey on cards (cost=0.29..8.31 rows=1 width=348) (actual time=0.011..0.011 rows=1 loops=2)
Index Cond: (id = transactions.card_id)
Total runtime: 0.103 ms
(10 rows)

Speed increased by 4.4x on a simple query and 2.2x on a complex query. Looking around the internet you will find many other people with even greater speed gains.

You can index non-foreign key fields if you query them a lot, such as name, or some other alternative id, and see incredible performance improvements.