Static lifetime

Is it possible to “statically” allocate a value? By this I mean mark a value such that it gets ignored by the GC and lives until the program exits?

This is indeed the purpose of Ancient, which comes with limitations and does not allow you to reclaim the memory until you exit the program (edit: you can in fact manually call the delete function). (I am curious to know how well it works with recent OCaml versions.)

it would be really interesting to learn whether Ocaml forbids blocks outside the heap.

The OCaml runtime has two modes (chosen at compilation) for dealing with so-called “out-of-heap” pointers. In the legacy one that Chet remembers, the GC uses a page table when scanning to be able to tell which pointers it possesses. In the “no-naked-pointers” mode devised more recently for efficiency reasons, the page table is replaced by looking at the colour in the header of the dereferenced value. Out-of-heap values must be preceded by a header with colour black. The no-naked-pointer mode is more restricted, because once a static value is referenced, it can no longer be deallocated, as you never know whether it is still reachable by the GC. This should be enough to support Ancient (edit: except deallocation).

One should verify such intuitions experimentally, before trying to fix them, but I’m not familiar with what OCaml profilers can do…

Excluding large long-lived data from the GC is an old idea. Among recent developments, Nguyen et al. [1] distinguish a “control path” (where the generational hypothesis is assumed to hold) from a “data path” (where values are assumed to follow an “epochal” behaviour (long-lived, similar lifetimes, benefit from locality), and are excluded from GC). They give as motivation so-called “big data” and as figures of pathological GC usage up to 50% of total runtime. I remember reading similar figures from blog posts about large data sets in OCaml. In reality this indeed depends on knobs you can turn on your GC that can result in increased peak memory usage among others. (Assuming infinite available memory, it is even possible to let the GC share drop to 0%.)

@ppedrot reported to me that in a recent experiment with Coq, using an Ancient-like trick to exclude some large, long-lived and rarely-accessed values from being scanned (namely serialising them into bigarrays), they saw an 8% performance improvement across the board in benchmarks.

Multicore, if I understood correctly, aims to support only the no-naked-pointer mode, and I am not sure what the page table will become. Coq currently does some out-of-heap allocation in the VM, and has been adapted to be compatible with the no-naked-pointer mode by wrapping out-of-heap pointers into custom blocks. For scanning its custom stack (which mixes in-heap and out-of-heap values), Coq sets up a custom root-scanning function (caml_scan_roots_hook), which still relies on the page table.

Note that having to wrap out-of-heap pointers in custom blocks is (much!) less expressive: for instance with Ancient you can call List.filter on a statically-allocated list (and correctly get a GC-allocated list of statically-allocated values). With custom blocks you cannot mix in-heap and out-of-heap values in this way.

For a type system to deal with “statically” allocated values, have a look at Rust, which: 1) prevents cycles of reference-counting schemes thanks to uniqueness, 2) can treat GC roots as resources to deal with backpointers at the leaves of the value (cf. the interoperability with SpiderMonkey’s GC in Servo). A point of view that I like is that tracing GCs and static allocation differ fundamentally by how they traverse values for collection: traversing live values for the first one, and traversing values at the moment of their death for the other. This gives them distinct advantages and drawbacks so one can see them as complementary. (See notably [2,3].) Static allocation is interesting for performance in some aspects (no tracing, no read-write barrier, reusability of memory cells, avoids calling the GC at inappropriate times), but I find it even more interesting for interoperability (e.g. exchanging values freely with C or Rust, or applications from that other thread). It is natural to want to mix them in a language.

As far as I understand, developing the runtime capabilities for OCaml to deal with out-of-heap pointers without resorting to an expensive page table is an engineering problem, not a fundamental one. If anyone is interested in this, please contact me.

[1] Nguyen et al., Yak : A High-Performance Big-Data-Friendly Garbage Collector, 2016
[2] Bacon, Cheng and Rajan, A Unified Theory of Garbage Collection, 2004
[3] Shahriyar, Blackburn and Frampton, Down for the Count? Getting Reference Counting Back in the Ring, 2012