[ANN] A dynamic checker for detecting naked pointers

Please do note that this isn’t a performance improvement for OCaml – this very much a correctness fix. The failure case is as follows:

  • a naked pointer is created using malloc on the C heap and held in the OCaml heap
  • the external region is free’d, but the naked pointer is still held in some OCaml heap.
  • the GC mallocs to expand, and that recently freed C memory becomes part of the OCaml heap
  • the GC then follows the naked pointer by treating it as an OCaml value, since the page table indicates that it is within the OCaml heap. However, the memory the naked pointer is aimed at is not necessarily a valid OCaml value as it was formerly a C pointer.
  • memory corruption ensues

The only way to really avoid this is by only holding naked references to static or global C values, which is a pretty minority usecase. As @lpw25 notes, you can hold them safely by wrapping them in custom blocks, which is entirely safe as it gives the GC a reliable way to determining what’s going on.

As for the question about a contiguous VA, this should work fine on 64-bit, where you have the luxury of such use of the address space. I built a version of this a decade ago for OCaml/Xen in early Mirage, which you can find evaluated in the HotCloud 2010 paper (Figure 4). It’s pretty straightforward, but the problems come from balancing external memory pressure (from C allocations) with the OCaml allocation. This can be adjusted with an obvious use of sbrk or realloc to grow or shrink the contiguous memory, while being careful to keep other memory allocations away from the OCaml area.

The current strategy will need to be maintained for 32-bit architectures however, which are very much supported (e.g. armv7). For those, there is very little wiggle room to hold a contiguous VA and so the current multicore approach lets us preserve a unified memory representation.

One observation I had when I read @stephenrkell’s excellent essay is how strange our current memory allocation mechanisms are in operating systems. We have conflated cooperative scheduling across components with enforcing protection from mutually untrusted control flow in the same language. For example, we have the system C malloc competing with the OCaml GC which competes with the kernel memory allocator. I’ve been sketching out a possible solution in multicore OCaml towards this:

  • We move away from Bigarray to a specialised Extvalue that handles external pages in a separate region of memory. Bigarray currently offers too much functionality (subarrays and proxies) which slows it down due to dropping into the C FFI.
  • The Extvalue is backed by a bundled slab allocator that works in a contiguous region of memory, disjoint from the OCaml heap.
  • The compiler provides primitives for very fast translation of values in and out of the Extvalue (as it does currently for Bigarray).
  • C libraries linked in with OCaml also use this memory allocator for their own mallocs. This will require some trickery (static compilation or LD_PRELOAD initially), but it means that all the allocations associated with a particular “task” (from OCaml to C or Rust code) can be batched together.
  • This approach lets us improve multicore memory locality greatly, as every modern machine has significant NUMA effects (see this FOSDEM 2013 talk), and cooperatively allocate memory. It also leaves open the possibility of separate isolation mechanisms (such as ARM memory domains or Intel MPK) across tasks in a large heap.

Please note that the above is still only at the experimental stage as I’m still evaluating it, but it does have the advantage of degrading gracefully if the system malloc has to be used (e.g. if OCaml is embedded as a library, noone expects 10GBs gigabit levels of network performance). From an ecosystem perspective, I don’t think anyone really wants to maintain the current hybrid world of a multitude of Bigarray-based overlays, such as cstruct or bigstring.

3 Likes