LSM-Trees vs B-Trees, Honestly

Write amplification, read amplification, space amplification — you only get to optimize two. A practical guide to picking a storage engine without the vendor hand-waving.

Storage-engine debates get religious fast, but the honest version fits on an index card. There are three costs — write amplification, read amplification, and space amplification — and no design lets you minimize all three at once. Pick the two that matter for your workload and accept the third.

# the RUM conjecture in practice

B-trees update in place, so a point read is a handful of page fetches and space overhead is low. The price is write amplification: every update dirties a full page, and random writes thrash your disk. LSM-trees flip this. Writes land in a memtable and get flushed sequentially, which is beautiful for ingest — but reads may have to check several levels, and compaction rewrites data over and over.

Benchmarks lie by omission. Ask what the write pattern was, how full the dataset was, and whether compaction had caught up.

# choosing on purpose

Write-heavy, append-mostly, time-series shaped? Reach for an LSM engine and tune compaction. Read-heavy with hot random point lookups and tight latency SLOs? A B-tree will usually treat you better. The wrong move is picking based on which database has the nicer landing page.

Whatever you choose, measure at steady state — after compaction has run and the cache is warm. The first ten minutes of a benchmark tell you almost nothing about the next ten months.