Ropes: what's "big enough to benefit from cache effects"?
From Wikipedia :
The main drawbacks are overall space utilization and slower indexing, both of which get larger as the tree structure gets larger and deeper. However, many practical uses of indexing involve only iterating over a row, which remains fast as long as the leaf nodes are large enough to benefit from the effects of the cache.
I am implementing a kind of compromise between ropes and strings. They are mostly just strings, except that I flatten the concatenated objects into strings when the concatenated strings are short. There are several reasons for this:
- The benefits of concatenation objects are minimal when the concatenated strings are short (it doesn't take too long to concatenate two strings in their normal form).
- Doing this decreases the scale / depth of the tree (reducing the low frequencies of the ropes).
- Doing this increases the size of the leaf nodes (to make better use of the cache).
However, as the length increases, the benefits of the ropes also decrease, so I would like to find a compromise. Logically, the sweet spot is around where the "leaf nodes are large enough to benefit from cache effects". The problem is, I don't know how big it is.
EDIT: As I was writing this, it occurred to me that the ideal size would be the cache page size, because then the rope only causes cache misses when they happen in line anyway. So my second question is, is this reasoning correct? And is there a cross-platform way to determine the cache page size?
My target language is C ++.
source to share
The limiting case for the string string will be built on top std::list<char>
. This is clearly not very efficient. When iterating, you will probably have one cache miss per sheet / char. As the number of characters per sheet increases, the average misses decrease with a break once your sheet allocation exceeds a single cache line.
It might be nice to have large leaves; memory transfers in cache hierarchies can vary in granularity at different levels. Also, when setting up a mixed set of processors (like consumer PCs), the leaf size that is the higher power of the two will be an integer multiple of the cache line size on other machines. For example. if you are accessing processors with 16 and 32 byte cache lines, 32 bytes is a better choice, since it is always an integer number of cache lines. Losing half a cache line is a shame.
source to share