How an aligned compression strategy ensures that 90% of the data read comes from a single sstable
I'm trying to understand how the Leveled Compaction Strategy in Cassandra works, which guarantees that 90% of all reads are satisfied from a single sstable.
From DataStax Doc:
new sstables are added to the first level, L0, and are immediately compacted with sstables in L1. When L1 fills up, additional sstables go up to L2. Subsequent sstables generated in L1 will be compressed with sstables in L2 with which they overlap.
source to share
LeveledCompactionStrategy (LCS) in Cassandra implements the internals of LevelDB. You can check the exact implementation information in the DEF document for level design.
To give you a simple explanation, consider the following points:
- Each sstable is created when a fixed (relatively small) size is reached. By default, L0 receives 5MB of file files, and each subsequent level is 10x the size . (in L1 you will have 50MB of data, L2 500MB, etc.).
- Stands are created by ensuring they don't overlap
- When the level fills up, compaction starts and the stables from level L rise to level L + 1. So in L1 you will have 50 MB in ~ 10 files, L2 500 MB in ~ 100 files, etc.
Bold is the relevant data that justifies 90% of the reading from the same file (sstable). Let's do the math together and things will become clearer (hopefully :)
Suppose you have keys A, B, C, D, E in L0 and each key accepts 1MB of data.
Then we insert the F key. As level 0 fills up, the seal will create a file with [A, B, C, D, E] at level 1, and F will remain at level 0.
This is ~ 83% of the data in 1 file in L1
Then we insert G, H, I, J, and K. So L0 fills up again, L1 gets a new sstable with [I, G, H, I, J].
Currently K in L0, [A, B, C, D, E] and [I, G, H, I, J] in L1
And that's ~ 90% of the data in L1 :)
If we continue to insert keys, we'll work around the same way, so you'll get 90% of read messages from roughly the same / sstable file.
More details and details (what happens with updates and tombstones) information is given in this paragraph at the referenced link (the sizes for the seal choices are different as they are the default values ββfor LevelDB and not C * s):
When the size of the L level exceeds its limit, we compress it in the background thread. Compression selects a file from level L and all overlapping files from the next level L + 1. Note: if a file of level L overlaps only part of the file at level (L + 1), the entire file at level - (L + 1) is used as input for compaction and will be discarded after compaction. In addition, since level 0 is special (files in it can overlap), we handle compression from level 0 to level 1 on purpose: level 0 compression can display more than one level 0 file in case some of these files overlap each other.
The seal combines the contents of the selected files to create a sequence of level (L + 1) files. We switch to creating a new level file (L + 1) after the current output file has reached the target file size (2MB). We also switch to a new output file when the key range of the current output file has grown enough to span more than ten level files (L + 2). This last rule ensures that subsequent compaction of the level (L + 1) file does not display too much data from level (L + 2).
Hope this helps!
source to share