Sometimes, I wonder if many functional data structures used in OCaml
could not have a [compact] method (lists, sets, maps, etc).
This method would encode the data-structure in a contiguous memory region,
so that the data-structure has more chance to fit into the CPU cache.
Of course, once you have called the [compact] method, you are not supposed
to modify the data-structure anymore (or, if you do so then the data structure
would become “uncompact” again, as it was before).
Some usage example:
- you construct a set once
- then many times you want to lookup elements for membership in that set
maybe @backtracking might have ideas on this
The [compact] method doesn’t need to be a function of the data structure, but rather, it can be a function of the whole program.
We can achieve that by incorporating locality information in the [Gc.compact] algorithm. I believe I saw some papers about GCs that can do something like this, but I can’t find them right now.
The advantage here is that we don’t need to modify the data structures at all.
However, it’s not all perfect:
- We still have to keep around the header words – so we don’t achieve maximum compaction
- If we have something like an n-ary tree, there are several possible layout choices, and we would have to have a heuristic to pick one, rather than a data-structure specific way.
But! I wonder how much it actually matters for the data structure to be physically continuous in memory. Rather, we may want to look at the utilization of cache lines. If we have lots of “gaps”, i.e. individual blocks whose length is not a multiple of the cache line size, then we’re wasting cache memory. So if we design the data structure to be composed of blocks that are a multiple of the cache line size, then we will achieve perfect utilization of the cache (provided we’re aligned to cache lines, probably including the header word).
In that case, I think we should have equivalent or better performance than contiguous layouts.
I don’t know their performance characteristics, but can recall hearing about succint tree (set/map) representastions from this link which is kind of relevant because compact means taking less space. https://tupl.cs.tufts.edu/papers/1904.02809.pdf
There’s also an interesting article on making list traversal 91% faster, not by changing the memory layout (although memory layout is important for it), but through a “value speculation” trick which transforms the list iteration to be like an array iteration. https://www.lortex.org/articles/value-speculation-ocaml/