Compact, memory-contiguous data structures

UnixJunkie · October 20, 2023, 4:08am

Sometimes, I wonder if many functional data structures used in OCaml
could not have a [compact] method (lists, sets, maps, etc).

This method would encode the data-structure in a contiguous memory region,
so that the data-structure has more chance to fit into the CPU cache.

Of course, once you have called the [compact] method, you are not supposed
to modify the data-structure anymore (or, if you do so then the data structure
would become “uncompact” again, as it was before).

Some usage example:

you construct a set once
then many times you want to lookup elements for membership in that set

UnixJunkie · October 22, 2023, 4:14am

maybe @backtracking might have ideas on this

smuenzel · October 22, 2023, 8:34am

The [compact] method doesn’t need to be a function of the data structure, but rather, it can be a function of the whole program.
We can achieve that by incorporating locality information in the [Gc.compact] algorithm. I believe I saw some papers about GCs that can do something like this, but I can’t find them right now.
The advantage here is that we don’t need to modify the data structures at all.

However, it’s not all perfect:

We still have to keep around the header words – so we don’t achieve maximum compaction
If we have something like an n-ary tree, there are several possible layout choices, and we would have to have a heuristic to pick one, rather than a data-structure specific way.

But! I wonder how much it actually matters for the data structure to be physically continuous in memory. Rather, we may want to look at the utilization of cache lines. If we have lots of “gaps”, i.e. individual blocks whose length is not a multiple of the cache line size, then we’re wasting cache memory. So if we design the data structure to be composed of blocks that are a multiple of the cache line size, then we will achieve perfect utilization of the cache (provided we’re aligned to cache lines, probably including the header word).
In that case, I think we should have equivalent or better performance than contiguous layouts.

hummy123 · October 22, 2023, 12:51pm

I don’t know their performance characteristics, but can recall hearing about succint tree (set/map) representastions from this link which is kind of relevant because compact means taking less space. https://tupl.cs.tufts.edu/papers/1904.02809.pdf

There’s also an interesting article on making list traversal 91% faster, not by changing the memory layout (although memory layout is important for it), but through a “value speculation” trick which transforms the list iteration to be like an array iteration. https://www.lortex.org/articles/value-speculation-ocaml/

s.t.s · September 23, 2024, 10:12pm

If what we need is an iterator, then all we really need is to lay out the data stored in the structure in the arrays, that are harmonious with the memory block sizes. Then again, why not design the structures against storing items in the arrays in the first place? That is, the sets and maps should just be indexing items in an array. The array itself could be exposed through a trivial iterator. Of course, we’d benefit more from using a collection of arrays of the right size. The arrays themselves may contain the data directly, and perhaps incorporate some weak links in the representation.

Is there a huge performance penalty when taking such approach?

Also, how does the array of weak references work in OCaml? Do they store data blocks directly, or just some heap values?

Topic		Replies	Views
Compact or succinct data structures in OCaml? Ecosystem	3	1337	March 12, 2018
State of compaction in OCaml 5? Community multicore , compiler	3	604	February 20, 2024
Concestor data structures Learning announce	5	1002	February 23, 2022
OCaml heap "fsck" and forcing collection of unreachable objects Community gc	3	499	May 31, 2023
Will the native compiler ever optimize memory layout of compound types? Learning optimization	0	940	February 21, 2019

Compact, memory-contiguous data structures

Related topics