A Primer on LSM Trees

· Ditsuke.


Log-Structured-Merge (LSM) Trees are a set of data-structures and algorithms for extremely fast writes in database systems.

Core Ideas #

We'll not talk about implementation details for Memtables and SSTables in much depth in this one, but basically:

Compaction #

So, we're optimizing writes by storing value or partial updates to values (depending on database model, K-V or K-Row). And so, we're going to have an increasing amount of fragmentation right?

Say over a day a key is updated 10 times. These writes are amplified to several SSTables depending on the spacing of these updates. Then in the future we have to go through all of these SSTables (filtered, of course, through bloom-filters built into the SSTable) containing updates for this value to compute the actual value. Not very optimal.

This is how we arrive at compaction. Periodically, an LSM tree will take a subset of its SSTables, merge them to eliminate old invalidated updates and #tombstones, and write this merged SSTable* to replace the merge subset.

The exact mechanics of compaction are nuanced and not something I'll go into right now (because I'm not too familiar myself), but in short you use heuristics like merging across generations (levels) to maximise utility.

Note: There may be one or more merged SSTables, with splitting across keysets possibly.

Tombstones #

How do you delete a value in an LSM tree? Well, SSTables are immutable, can't go to them and delete values. What you do instead: add an entry for tombstoning k -> DELETE, for example. This marks a deletion or a tombstone. Then at time of compaction the LSM tree can get rid of the key in the compacted table.

Write Path #

Write -> WAL -> Memtable -> flush (batch, async) -> SSTable

Read Path #

Memtable -> SSTables (new -> old, prune with bloom filter)

Advantages and tradeoffs #

last updated: