plus/minus epsilon

On-Disk Caching with SQLite

2 Jul 2020

Recently, a project I was working on required a cache that would ideally be capable of storing tens of millions of entries, reaching into the multiple-terabyte range. At this scale, using a single spinning disk is significantly cheaper than storing everything in memory, which Redis or Memcached would require. Another classic KV store, LevelDB, also ends up being impractical because it writes the value for each key to disk multiple times. If your values are small, LevelDB can be really really performant, but in my case the values are moderately sized (tens of kilobytes) and writing them to disk multiple times quickly becomes user-visible. And finally, I expected that storing this many entries on-disk as individual files in a folder would cause issues with inode exhaustion, and ultimately, the ACID guarantees that a filesystem provides have never been super clear to me.

This blog post describes how to build an on-disk cache with SQLite that's capable of evicting entries when the cache gets too large. SQLite is a bit more machinery than I would've initially wanted but it scales to the size I want, provides ACID guarantees, and is surprisingly performant. The only difficulty is that it doesn't have a good way to randomly select rows for eviction.

The underlying idea to get around this is to maintain a random shuffle of keys in the database, so that 'randomly evicting' simply consists of deleting the last N rows. The reason for random eviction (instead of LRU) is that I didn't want Get operations to cause any writes to disk, which an LRU would need to keep track of usage.

All operations are fairly fast, having a runtime that's logarithmic in the number of rows.

Schema

The table schema:

CREATE TABLE cache (key TEXT NOT NULL PRIMARY KEY, val BYTEA);

The PRIMARY KEY clause creates an index on the key column, and this enables efficient lookups by key.

Startup

When the process using the cache starts up, it needs to first get the max rowid and store this in a variable called n. This is used for maintaining the random shuffle. The max rowid can be found with:

SELECT MAX(rowid) FROM cache;

Get

Get operations simply consist of looking up the val of the corresponding key.

SELECT val FROM cache WHERE key = ?;

Set

A Set operation is accomplished by performing an "inside-out" Fisher-Yates shuffle. First, increment n = n+1. Then choose a random number i such that 1 <= i <= n. If i != n, then move the row at rowid i to rowid n:

UPDATE cache SET rowid = n WHERE rowid = i;

and insert the new row at rowid i:

INSERT OR REPLACE INTO cache (rowid, key, val) VALUES (i, ?, ?);

Note that the multiple queries should be performed in a transaction for consistency.

Delete

To Delete the row with a given key, first lookup the rowid i of the row you wish to delete:

SELECT rowid FROM cache WHERE key = ?;

Then delete that row, and move the row with the highest rowid into that spot:

DELETE FROM cache WHERE rowid = i;
UPDATE cache SET rowid = i WHERE rowid = n;

And finally decrement n = n-1. Again, these queries should be performed in a transaction for consistency.

Eviction

If the cache is too large, a number of rows can be evicted by repeatedly deleting the row with the largest rowid:

DELETE FROM cache WHERE rowid = n;

and decrementing n = n-1.

Note that eviction won't actually reduce the amount of disk space used by SQLite without running an expensive VACCUM query, but it will allow previously allocated space to be reused.

Implementation

This cache is implemented in the on-disk caching feature of UtahFS.

Engineering UtahFS