Basic questions about allocation / minimizing allocations

Sometimes people say that for critical code which cannot have any latency you want to “minimize the number of allocations” in order to prevent triggering a garbage collection cycle.

I basically do not understand how one does this or what heuristics to follow and I would like an overview of what triggers an allocation.

Let me explain my understanding first, which may be naive.
My understanding is that there is one basic 8 byte word which is at the foundation of everything, everything that lives on the OCaml call stack is represented by this 8 byte word. There is one bit which marks it as a 63 bit integer or a pointer to another data type. This other datatype always lives on the heap, as I understand it. (Maybe almost always modulo some compiler optimizations?)

Because all values other than integers are represented by pointers to the heap, it seems like defining literally any value other than an integer would tend to cause a new object to be allocated on the heap. Something as simple as let z = x +. y would potentially incur a new float allocation to store z, although if z is immediately used directly to define something else, maybe the compiler can use a local register to store z, and some user knowledge of the runtime behavior is necessary here as this is not part of the language proper.

As far as I understand it, the garbage collector generally is not guaranteed to update values in place. Even if one writes a.(i) <- a.(i) +. b.(i), one still could allocate a new float on the heap to store a.(i)+.b(i) and then update the pointer at a.(i) to point to this, there is no way of guaranteeing that a.(i) <- a.(i) + b.(i) actually updates the location where a.(i) currently points.

So, as I write this I’m starting to suspect that I’m “technically” correct in that the language doesn’t offer any guarantees on this, but in practice expert users who are more familiar with the runtime understand what optimizations will be made under the hood.

For example, if one uses the unboxed floatarray’s, one might expect a.(i) <- a.(i) +. b.(i) to avoid any allocations at all, as this compiler optimization is obvious, but if one writes a.(i) <- f a.(i) b.(i) where f is a big complicated function which will not in general be inlined, I might imagine (without much knowledge of how the runtime operates) that since f is compiled to operate on boxed elements, a.(i) and b.(i) will be first boxed, then passed to f, then f could potentially allocate its result on the heap, and then the value would be copied out of the heap location into a.(i), and so the use of the unboxed floatarray’s is possibly counterproductive here as it increases the amount of copying.

If anyone wants to clarify my understanding so I can better write allocation-efficient code, I’ll appreciate your input. I think my basic confusion seems to be: given a certain algorithm one must define certain intermediary values that allow us to compute the final result from the input, and if any intermediary value definition can potentially live on the stack then it’s not clear that any minor restructing of the algorithm can get around those definitions, and one must throw out the algorithm and start over. Is this an exaggeration or is this close to how it works?

2 Likes

I do not know the answer to your question, but I know a way of investigating them. There is a module Gc that can be used to check stats about the GC, as for instance the number of allocations. This can be used to compare programs.

1 Like

Yes, that’s correct, knowing that all scalar types (booleans, integers, characters, constant constructors, ()) are represented as integers at runtime.

That’s indeed the case: floating-point numbers don’t fit in 63 bits so they are boxed. However, the native-code compiler is able to optimize away many of the intermediate allocations in first-order code (see Unboxed floats in OCaml | LexiFi for a somewhat outdated blog post about the topic).

In fact, the runtime never updates an immutable value, like a float, in-place: it would not be sound, as the value may be shared.

That is correct. However, a lot hinges on how much inlining takes place, so it is a bit complicated to give a general rule.

Something that helps me quite a bit when analyzing how much allocation takes place in a particular function is to look at the output of ocamlopt -dcmm -c foo.ml: allocations appear as (alloc ...) nodes.

cc @vlaviron as an expert in compiler optimizations and who will be able to give you a more complete answer.

Cheers,
Nicolas

3 Likes

I also recommend the talk Pourquoi écrire du C quand on peut faire pire en OCaml ? (slides) by @chambart. It’s in french but looking at the code in the slides can help.

2 Likes