A pattern I’ve seen a couple times for immutable data:
- Generate the data using a batch process
- Store the data in an an indexed structure (like SSTable)
- Expose the structure through an API
The result is a key-value store with extremely high read performance.
The first time I heard about this was Twitter’s Manhattan database. Recently, I saw the pattern again at a different company. Ilya Grigorik wrote about it several years ago in the context of log-structured data, BigTable and LevelDB.
My takeaway is: this pattern is worth considering if:
- my current store is having issues (no need to fix what’s not broken)
- I have heavy read traffic
- I can tolerate latency on updates
The context of log-structured makes me think that might open a door to write access too. Twitter’s post mentions a “heavy read, light write” use-case, although it also describes use of a B-tree structure rather then a simple sorted file for that case. Grigorik’s post mentions BigTable uses a “memtable” to facilitate writes.
Note Web’s IndexedDB has a similar access pattern to SSTable. If I think about remote updates as an infrequent write, then the pattern described here might be a common use-case for Web, which might bring this around full circle: Google crawls the Web in a batch process and updates an index which is read-heavy.