Static lifetime

If the value contains mutable pointers to heap blocks, I think the GC must scan the object in order to collect the dead ones, no?

One possibility, which I have no idea if it applies in your situation, would be to use off-the-OCaml heap allocation, such as custom blocks, which aren’t scanned for pointers by the GC (but must not contain pointers to OCaml heap data).

Ah, good point. This brings us to a situation where data may not be mutated arbitrarily, which is problematic.

If only there was a way to statically assign ownership of data for GC purposes. Any object and all its descendants would be guaranteed to have the same owner. Attaching an object to another owner would require a deep copy (a form of message passing). This would allow the existence of an owner of large data with no GC activity, which would be read and copied by other owners into their own space with high GC activity.

No idea what kind of type system could support constraints like “these two objects must have the same owner”.

Sounds like a candidate for a compiler optimization to me.

One should verify such intuitions experimentally, before trying to fix them, but I’m not familiar with what OCaml profilers can do… let

Are you familiar with generational garbage collection? Most programs have data which changes very little; generational GCs put that data into the “old” generation, which is only scanned during major GC cycles, which are pretty rare given enough RAM. Generational GC comes from the 70s, and IIUC OCaml supports it well. Recommended: Understanding the Garbage Collector - Real World OCaml.

Generational GCs also takes care of this issue:

During minor collections, they don’t need to; they just track the few pointers from the old to the new generation, the “intergenerational pointers” (mostly by intercepting writes to the old one via “write barriers”). These tricks are also discussed in the link I gave.

One option is to call into C bindings and allocate your stuff with malloc and return that pointer as an abstract type (just declare type t and say your stub returns a t). I think the GC will ignore those. The downside is I believe you have to access your data through C bindings.

I agree with you completely in the sense that generational GCs can solve this problem, but I thought Ocaml still had two generations (nursery and major heap), and thus, all longer-lived data ends up in the major heap right? And hence it all has to be scanned on each major GC, right? Yes, that’s more infrequent than minor GCs, but having lots of long-lived immutable data in the major heap ought to make major GC take longer, yes?

My memory is that Ocaml doesn’t require that pointers must point into the heap – so blocks could be stored in a read-only text segment and those blocks could point to each other to form a graph, no? You wouldn’t have any type-system-enforced guarantee that code traversing that data would be barred from attempting to mutate the data, and such mutation would yield a memory error, but aside from that, it seems like this should work?

That’s correct.

I believe this has gradually changed, and it is now a requirement (or close to being one).

You actually really want the type system to know this, because otherwise you’d get runtime exceptions. Ideally, OCaml would have a mechanism tracking mutability. Since we don’t have that, if you have a lot of data, you should store if off-heap using C bindings or in Bigarrays.

Hm … I’ll try to argue against this position.

First, any type-system-enforced rule would be insufficient. The purpose of this mechanism is efficiency. So you’re going to want hashtables in there, that get filled-up and then, after “exporting to a text segment” are never modified. But this will be a matter of code, not types. [One could imagine two types (mutable) MHT and (immutable) IHT and a function that mapped from the first to the second, but then right before you export, you’d have to copy the entire graph of data, which would be both unwieldy and require that your copier know enough to preserve graph-structure – all a pain.]

So I’d argue that whatever tool one used, it would (a) check that all types are immutable, BUT (b) have a loophole where the caller could tell the tool about types that, while not immutable, should be treated “as if” they’re immutable…

In short, you can’t avoid runtime errors without compromising utility.

Second: it would be really interesting to learn whether Ocaml forbids blocks outside the heap. If nobody has a definitive judgment on that, I guess it’s time to write some code and figure it out.

I see. That’s far more dynamic than what I was thinking about, which is to have pre-existing read-only data. In that case, the type system is indeed inadequate.

There is precedent for wanting to do this sort of thing: the “unexec”/“undump” facilities in GNUemacs and TeX. In both cases, it’s not just code that gets loaded (though there’s a lot of code) – lots of data (not just bytecode) gets loaded too. Lots of Scheme/LISP implementations did this too, back in the day.

Interestingly, the standard library has this: https://caml.inria.fr/pub/docs/manual-ocaml/libref/Obj.html#VALreachable_words

val reachable_words : t -> int

Computes the total size (in words, including the headers) of
all heap blocks accessible from the argument. Statically
allocated blocks are excluded.

@Since 4.04

So it looks like at least someone thinks that blocks can be statically allocated :slight_smile:

Looking at the code of caml_obj_reachable_words, it seems that the code makes provision for the possibility that pointers from heap-allocated blocks might point outside the heap/nursery, viz.

  if (Is_long(v) || !Is_in_heap_or_young(v)) return Val_int(0);

In native code, structured constants are allocated on the data segment (so “statically allocated”), not on the GC-managed heap.

1 Like

Oh OK. Just to clarify: by ‘native code’ you mean external C code that we would call with OCaml FFI? Or you mean OCaml compiling to native binary?

Sorry, I meant compiling to native code.

1 Like

Is it possible to “statically” allocate a value? By this I mean mark a value such that it gets ignored by the GC and lives until the program exits?

This is indeed the purpose of Ancient, which comes with limitations and does not allow you to reclaim the memory until you exit the program (edit: you can in fact manually call the delete function). (I am curious to know how well it works with recent OCaml versions.)

it would be really interesting to learn whether Ocaml forbids blocks outside the heap.

The OCaml runtime has two modes (chosen at compilation) for dealing with so-called “out-of-heap” pointers. In the legacy one that Chet remembers, the GC uses a page table when scanning to be able to tell which pointers it possesses. In the “no-naked-pointers” mode devised more recently for efficiency reasons, the page table is replaced by looking at the colour in the header of the dereferenced value. Out-of-heap values must be preceded by a header with colour black. The no-naked-pointer mode is more restricted, because once a static value is referenced, it can no longer be deallocated, as you never know whether it is still reachable by the GC. This should be enough to support Ancient (edit: except deallocation).

One should verify such intuitions experimentally, before trying to fix them, but I’m not familiar with what OCaml profilers can do…

Excluding large long-lived data from the GC is an old idea. Among recent developments, Nguyen et al. [1] distinguish a “control path” (where the generational hypothesis is assumed to hold) from a “data path” (where values are assumed to follow an “epochal” behaviour (long-lived, similar lifetimes, benefit from locality), and are excluded from GC). They give as motivation so-called “big data” and as figures of pathological GC usage up to 50% of total runtime. I remember reading similar figures from blog posts about large data sets in OCaml. In reality this indeed depends on knobs you can turn on your GC that can result in increased peak memory usage among others. (Assuming infinite available memory, it is even possible to let the GC share drop to 0%.)

@ppedrot reported to me that in a recent experiment with Coq, using an Ancient-like trick to exclude some large, long-lived and rarely-accessed values from being scanned (namely serialising them into bigarrays), they saw an 8% performance improvement across the board in benchmarks.

Multicore, if I understood correctly, aims to support only the no-naked-pointer mode, and I am not sure what the page table will become. Coq currently does some out-of-heap allocation in the VM, and has been adapted to be compatible with the no-naked-pointer mode by wrapping out-of-heap pointers into custom blocks. For scanning its custom stack (which mixes in-heap and out-of-heap values), Coq sets up a custom root-scanning function (caml_scan_roots_hook), which still relies on the page table.

Note that having to wrap out-of-heap pointers in custom blocks is (much!) less expressive: for instance with Ancient you can call List.filter on a statically-allocated list (and correctly get a GC-allocated list of statically-allocated values). With custom blocks you cannot mix in-heap and out-of-heap values in this way.

For a type system to deal with “statically” allocated values, have a look at Rust, which: 1) prevents cycles of reference-counting schemes thanks to uniqueness, 2) can treat GC roots as resources to deal with backpointers at the leaves of the value (cf. the interoperability with SpiderMonkey’s GC in Servo). A point of view that I like is that tracing GCs and static allocation differ fundamentally by how they traverse values for collection: traversing live values for the first one, and traversing values at the moment of their death for the other. This gives them distinct advantages and drawbacks so one can see them as complementary. (See notably [2,3].) Static allocation is interesting for performance in some aspects (no tracing, no read-write barrier, reusability of memory cells, avoids calling the GC at inappropriate times), but I find it even more interesting for interoperability (e.g. exchanging values freely with C or Rust, or applications from that other thread). It is natural to want to mix them in a language.

As far as I understand, developing the runtime capabilities for OCaml to deal with out-of-heap pointers without resorting to an expensive page table is an engineering problem, not a fundamental one. If anyone is interested in this, please contact me.

[1] Nguyen et al., Yak : A High-Performance Big-Data-Friendly Garbage Collector, 2016
[2] Bacon, Cheng and Rajan, A Unified Theory of Garbage Collection, 2004
[3] Shahriyar, Blackburn and Frampton, Down for the Count? Getting Reference Counting Back in the Ring, 2012

2 Likes

From a very quick search in the literature: “ownership types” or “session types”, maybe.

If you can store your long-leaved data into a bigarray, I think you would reach the effect that you were looking for (no more GC scanning of this data).

This was once advised to me by Oleg, for some performance-critical section of some code.

It would be nice if this use-case is supported, for high performance parallel computing in OCaml: create some data, mark it as read-only then fork the workers.