plus/minus epsilon
On-Disk Caching with SQLite
2 Jul 2020Recently, a project I was working on required a cache that would ideally be capable of storing tens of millions of entries, reaching into the multiple-terabyte range. At this scale, using a single spinning disk is significantly cheaper than storing everything in memory, which Redis or Memcached would require. Another classic KV store, LevelDB, also ends up being impractical because it writes the value for each key to disk multiple times. If your values are small, LevelDB can be really really performant, but in my case the values are moderately sized (tens of kilobytes) and writing them to disk multiple times quickly becomes user-visible. And finally, I expected that storing this many entries on-disk as individual files in a folder would cause issues with inode exhaustion, and ultimately, the ACID guarantees that a filesystem provides have never been super clear to me.
This blog post describes how to build an on-disk cache with SQLite that's capable of evicting entries when the cache gets too large. SQLite is a bit more machinery than I would've initially wanted but it scales to the size I want, provides ACID guarantees, and is surprisingly performant. The only difficulty is that it doesn't have a good way to randomly select rows for eviction.
The underlying idea to get around this is to maintain a random shuffle of keys
in the database, so that 'randomly evicting' simply consists of deleting the
last N rows. The reason for random eviction (instead of LRU) is that I didn't
want Get
operations to cause any writes to disk, which an LRU would need to
keep track of usage.
All operations are fairly fast, having a runtime that's logarithmic in the number of rows.
Schema
The table schema:
CREATE TABLE cache (key TEXT NOT NULL PRIMARY KEY, val BYTEA);
The PRIMARY KEY
clause creates an index on the key
column, and this enables
efficient lookups by key
.
Startup
When the process using the cache starts up, it needs to first get the max
rowid and store this in a variable called n
. This is used for maintaining
the random shuffle. The max rowid can be found with:
SELECT MAX(rowid) FROM cache;
Get
Get
operations simply consist of looking up the val
of the corresponding
key
.
SELECT val FROM cache WHERE key = ?;
Set
A Set
operation is accomplished by performing an "inside-out" Fisher-Yates
shuffle. First, increment n = n+1
. Then choose a random number i
such that
1 <= i <= n
. If i != n
, then move the row at rowid i
to rowid n
:
UPDATE cache SET rowid = n WHERE rowid = i;
and insert the new row at rowid i
:
INSERT OR REPLACE INTO cache (rowid, key, val) VALUES (i, ?, ?);
Note that the multiple queries should be performed in a transaction for consistency.
Delete
To Delete
the row with a given key, first lookup the rowid i
of the row
you wish to delete:
SELECT rowid FROM cache WHERE key = ?;
Then delete that row, and move the row with the highest rowid into that spot:
DELETE FROM cache WHERE rowid = i;
UPDATE cache SET rowid = i WHERE rowid = n;
And finally decrement n = n-1
. Again, these queries should be performed in a
transaction for consistency.
Eviction
If the cache is too large, a number of rows can be evicted by repeatedly deleting the row with the largest rowid:
DELETE FROM cache WHERE rowid = n;
and decrementing n = n-1
.
Note that eviction won't actually reduce the amount of disk space used by SQLite
without running an expensive VACCUM
query, but it will allow previously
allocated space to be reused.
Implementation
This cache is implemented in the on-disk caching feature of UtahFS.