Sometimes people say that for critical code which cannot have any latency you want to “minimize the number of allocations” in order to prevent triggering a garbage collection cycle.

I basically do not understand how one does this or what heuristics to follow and I would like an overview of what triggers an allocation.

Let me explain my understanding first, which may be naive.

My understanding is that there is one basic 8 byte word which is at the foundation of everything, everything that lives on the OCaml call stack is represented by this 8 byte word. There is one bit which marks it as a 63 bit integer or a pointer to another data type. This other datatype always lives on the heap, as I understand it. (Maybe almost always modulo some compiler optimizations?)

Because all values other than integers are represented by pointers to the heap, it seems like defining literally any value other than an integer would tend to cause a new object to be allocated on the heap. Something as simple as `let z = x +. y`

would potentially incur a new float allocation to store `z`

, although if `z`

is immediately used directly to define something else, maybe the compiler can use a local register to store `z`

, and some user knowledge of the runtime behavior is necessary here as this is not part of the language proper.

As far as I understand it, the garbage collector generally is not guaranteed to update values in place. Even if one writes `a.(i) <- a.(i) +. b.(i)`

, one still could allocate a new float on the heap to store `a.(i)+.b(i)`

and then update the pointer at `a.(i)`

to point to this, there is no way of guaranteeing that `a.(i) <- a.(i) + b.(i)`

actually updates the location where `a.(i)`

currently points.

So, as I write this I’m starting to suspect that I’m “technically” correct in that the language doesn’t offer any guarantees on this, but in practice expert users who are more familiar with the runtime understand what optimizations will be made under the hood.

For example, if one uses the unboxed `floatarray`

’s, one might expect `a.(i) <- a.(i) +. b.(i)`

to avoid any allocations at all, as this compiler optimization is obvious, but if one writes `a.(i) <- f a.(i) b.(i)`

where `f`

is a big complicated function which will not in general be inlined, I might imagine (without much knowledge of how the runtime operates) that since `f`

is compiled to operate on boxed elements, `a.(i)`

and `b.(i)`

will be first boxed, then passed to `f`

, then `f`

could potentially allocate its result on the heap, and then the value would be copied out of the heap location into `a.(i)`

, and so the use of the unboxed `floatarray`

’s is possibly counterproductive here as it increases the amount of copying.

If anyone wants to clarify my understanding so I can better write allocation-efficient code, I’ll appreciate your input. I think my basic confusion seems to be: given a certain algorithm one must define certain intermediary values that allow us to compute the final result from the input, and if any intermediary value definition can potentially live on the stack then it’s not clear that any minor restructing of the algorithm can get around those definitions, and one must throw out the algorithm and start over. Is this an exaggeration or is this close to how it works?