[ANN] A dynamic checker for detecting naked pointers

We’re happy to release an OCaml compiler switch for dynamically detecting naked pointers in the code.

Naked pointers in OCaml

A naked pointer is a pointer outside the OCaml heap without a valid header. A header outside the heap is said to be valid if it is colored black. OCaml does permit naked pointers to word-aligned addresses. However, the presence of naked pointers incurs overhead in the garbage collector (GC). Whenever the GC intends to follow a pointer, it must check that the pointer is indeed in the OCaml heap. The GC consults a page table that maintains the list of pages currently used by the heap and only follows the pointer if it belongs to one of the pages. As you can imagine, this adds some overhead in the GC. For the multicore GC, maintaining a page table that remains consistent when multiple domains are allocating and running GC in parallel would necessitate some synchronization around the page table for reading and writing to it. It is quite likely that this cost will be prohibitive.

Luckily, OCaml already has a no-naked-pointer mode where the compiler assumes that the code does not have naked pointers, and hence, does not consult the page table for following pointers during GC (except Closure_tag objects). The no-naked-pointer mode is a configure-time option, enabled by configuring the compiler with --disable-naked-pointers. Multicore OCaml compiler does not use a page table in its implementation currently.

Dynamic Check for naked pointers

With the aim of migrating to no-naked-pointer mode as the default in future releases of OCaml, eventually paving the way for upstreaming multicore support, we’re happy to release a variant of OCaml 4.10.0 with a dynamic checker for the presence of naked pointers in the code. OCaml PR#9534 has the discussion around this checker. This variant can be installed with:

$ opam update
$ opam switch create 4.10.0+nnpcheck
$ eval $(opam env)

Once the variant is installed, you can install your favorite libraries using opam and run your program to get a report of naked pointers. Let us look at an example. We know that frama-c has naked pointers.

$ opam install frama-c
$ frama-c
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)

The checker prints warnings to standard error with the address that contains the naked pointer, the naked pointer and the reason why the warning was raised.

Finding the sources

While the warnings are useful for indentifying that the program has naked pointer, it does not help with finding the source of the naked pointer in code. For this, we recommend the use of rr. rr is record and replay framework that wraps around the familiar gdb interface. We can debug the error above as follows:

$ rr frama-c
rr: Saving execution to trace directory `/home/kc/.local/share/rr/frama-c-5'.
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144)
$ rr replay
(rr) watch *(value*)0x55fc1e2754d8
Hardware watchpoint 1: *(value*)0x55fc1e2754d8
(rr) c
Continuing.

Hardware watchpoint 1: *(value*)0x55fc1e2754d8

Old value = 1
New value = 94541327240384
0x000055fc1dab48f8 in camlUnmarshal__entry () at src/libraries/datatype/unmarshal.ml:72
72      src/libraries/datatype/unmarshal.ml: No such file or directory.

This corresponds to the naked pointer at https://github.com/Frama-C/Frama-C-snapshot/blob/master/src/libraries/datatype/unmarshal.ml#L72.

Fixing naked pointers

The recommended way of fixing naked pointers is to wrap them in an OCaml object with Custom_tag or Abstract_tag (as appropriate).

Limitations

The dynamic analysis only work on AMD64 backend with GCC and Clang. It has been known to work on Linux and MacOS. rr currently requires an Intel CPU with Nehalem (2010) or later microarchitecture.

Credits

The analysis was originally proposed by Mark Shinwell (@mshinwell).

16 Likes

Hi KC. Thanks for a nice post… I’ve learned something about OCaml, and great to see rr proving itself yet again.

As someone with an interest in cross-language interop, I like naked pointers. So I’m interested in design choices that might make the “our heap or not?” check fast, and in reasons why impls might not go for them. Feel free to answer “read the paper”, but I was thinking you could do something like reserve a big contiguous chunk of VAS for OCaml heaps’ use, and then the test could be a simple shift and compare. Would that be viable?

3 Likes

I am actually writing an RFC about that exactly (where I was actually quoting your essay @stephenrkell!). There are a few specifics for multicore, but after some research I concluded that this is doable. More to come!

Edit: welcome to this discuss!

1 Like

Ah cool – thanks Guillaume! Glad to know it’s a known thing. (I also wrote a little about it in this paper (section 6 I think). Sadly am still in search of time/funds to work on these things myself.

Note: we had a proposed implementation to reserve a huge chunk of virtual memory by @jhjourdan in 2013: #6101, and I think this is now an option one can pass to the runtime system (but it is not activated by default).

Are you referring to the “huge pages” option? This is not what it does.

Ah, sorry, I was confused. (This option is still undocumented, so it is painful to check these kind of things.) In any case, #6101 is a big-mmap approach to the page table.

I don’t know if there is a deep reason why Multicore cannot use this approach for a page table – @kayceesrk would be able to tell. Maybe it is more that some people were already advocating for the no-naked-pointers mode (it is an experimental option in the upstream runtime), and we felt we were “close enough” to obtaining this guarantee that it was reasonable to make this assumption.

(The virtual address-space trick does have costs, for example it does not work well with a lot of system tooling out there, and I’m not sure it would scale that well to a scenario with many different/separate heaps that each want their fast check. We can get up to 2^32 heaps of size 2^32, but this is not enough for everyone.)

During my research, I have read many misconceptions about virtual addressing which I intend to clear up, I propose to wait (my RFC will be available later today hopefully) so that we can have a focused conversation—I am very interested in scenarios you can come up with. (The RFC grew up out of a response to #6101 originally.)

It is worth pointing out that naked-pointers are extremely rare and strongly discouraged. They are hard to use safely even now. Expending any effort to continue supporting them is not a good use of maintainer’s time.

Something that I don’t think is entirely clear from KC’s post: you can have pointers to outside the OCaml heap as part of “abstract” or “custom” blocks. Those are the recommended ways to reference external data and they will continue to work going forward.

4 Likes

As @gasche had mentioned earlier, the no-naked-pointers mode was already there in OCaml and it is known to work on all the platforms that OCaml was supported. Hence, it was a reasonable path to pursue for Multicore.

The concurrent minor collector in Multicore OCaml uses the virtual address space trick, but only for the minor heap area. It needs contiguous 4GB reserved for 128 domains, each with max 16MB minor heap arena. This can be modified at compiler configure time. For comparison the minor heap is 2MB by default in OCaml and so 16MB should be quite enough. We hadn’t considered this trick for the major heap in Multicore.

However, given our experimental evaluation (see paper), we have chosen not to pursue concurrent minor collector for the initial version of multicore support to be upstreamed. The alternative stop-the-world parallel minor collector scales better and does not break the C FFI. The parallel minor collector does not need the virtual address space trick.

Given that the space for the entire heap should be reserved, how would it work on 32-bit architectures, and does it have an impact on system tooling. Looking forward to reading @gadmm’s RFC.

1 Like

The minor heap in Coq is 256MB, if I remember correctly. Given the way OCaml and Coq have evolved, I do not know if such a huge minor heap is still warranted nowadays. But it certainly was a much needed change back then.

Also, I just checked with Why3. While a 256MB minor heap does not seem that useful, the tool would certainly benefit from 32MB or 64MB.

As always, mileages and all that.

Interesting. I didn’t know Coq uses 256MB minor heap. I would be interested in seeing benchmarks that benefit from large minor heaps. Multicore recently gained the ability to compile Coq, and we are working on adding Coq benchmarks to Sandmark.

To be clear, the current minor GC scheme in Multicore is the stop-the-world parallel collector which does not place a size restriction on the minor heap. So large minor heaps should work. However, the Multicore OCaml implementation still carries vestiges of the concurrent minor collector and will prevent you from creating large heaps, but there’s no technical reason why it should.

About Sandmark: Are you taking any relevant workloads? Is there anything else to do than opening a PR on the repo? Any additional procedure to follow? Are you looking for specific kinds of benchmark?

I’m considering contributing a benchmark for data-encoding (serialising/deserialising library).

Hi @raphael-proust. We don’t have any specific requirements. Please go ahead!

One caveat is that all of our benchmarks are CPU intensive. We’ve spent a bunch of time thinking about eliminating noise for CPU intensive workloads. We’ve not thought about I/O much. Is data-encoding going to be I/O intensive?

Please do note that this isn’t a performance improvement for OCaml – this very much a correctness fix. The failure case is as follows:

  • a naked pointer is created using malloc on the C heap and held in the OCaml heap
  • the external region is free'd, but the naked pointer is still held in some OCaml heap.
  • the GC mallocs to expand, and that recently freed C memory becomes part of the OCaml heap
  • the GC then follows the naked pointer by treating it as an OCaml value, since the page table indicates that it is within the OCaml heap. However, the memory the naked pointer is aimed at is not necessarily a valid OCaml value as it was formerly a C pointer.
  • memory corruption ensues

The only way to really avoid this is by only holding naked references to static or global C values, which is a pretty minority usecase. As @lpw25 notes, you can hold them safely by wrapping them in custom blocks, which is entirely safe as it gives the GC a reliable way to determining what’s going on.

As for the question about a contiguous VA, this should work fine on 64-bit, where you have the luxury of such use of the address space. I built a version of this a decade ago for OCaml/Xen in early Mirage, which you can find evaluated in the HotCloud 2010 paper (Figure 4). It’s pretty straightforward, but the problems come from balancing external memory pressure (from C allocations) with the OCaml allocation. This can be adjusted with an obvious use of sbrk or realloc to grow or shrink the contiguous memory, while being careful to keep other memory allocations away from the OCaml area.

The current strategy will need to be maintained for 32-bit architectures however, which are very much supported (e.g. armv7). For those, there is very little wiggle room to hold a contiguous VA and so the current multicore approach lets us preserve a unified memory representation.

One observation I had when I read @stephenrkell’s excellent essay is how strange our current memory allocation mechanisms are in operating systems. We have conflated cooperative scheduling across components with enforcing protection from mutually untrusted control flow in the same language. For example, we have the system C malloc competing with the OCaml GC which competes with the kernel memory allocator. I’ve been sketching out a possible solution in multicore OCaml towards this:

  • We move away from Bigarray to a specialised Extvalue that handles external pages in a separate region of memory. Bigarray currently offers too much functionality (subarrays and proxies) which slows it down due to dropping into the C FFI.
  • The Extvalue is backed by a bundled slab allocator that works in a contiguous region of memory, disjoint from the OCaml heap.
  • The compiler provides primitives for very fast translation of values in and out of the Extvalue (as it does currently for Bigarray).
  • C libraries linked in with OCaml also use this memory allocator for their own mallocs. This will require some trickery (static compilation or LD_PRELOAD initially), but it means that all the allocations associated with a particular “task” (from OCaml to C or Rust code) can be batched together.
  • This approach lets us improve multicore memory locality greatly, as every modern machine has significant NUMA effects (see this FOSDEM 2013 talk), and cooperatively allocate memory. It also leaves open the possibility of separate isolation mechanisms (such as ARM memory domains or Intel MPK) across tasks in a large heap.

Please note that the above is still only at the experimental stage as I’m still evaluating it, but it does have the advantage of degrading gracefully if the system malloc has to be used (e.g. if OCaml is embedded as a library, noone expects 10GBs gigabit levels of network performance). From an ecosystem perspective, I don’t think anyone really wants to maintain the current hybrid world of a multitude of Bigarray-based overlays, such as cstruct or bigstring.

3 Likes

The main thing that the library does is take arbitrarily complex/big values and serialise them into bytes or json. So it’d be memory-access intensive, but no files/sockets/etc.

1 Like

Please do submit that. It would handy if it had a mode to do a “loopback” test over a localhost socket to stress that part, but it’s already useful to have a test which is a combination of GC pressure and memory access, and CPU.

1 Like

Writing that it is “entirely safe” is an overstatement. You still need to make sure that the pointer, whether naked or wrapped in a custom block, is no longer accessible from the OCaml code once the memory has been freed. For example, you can make sure that the memory is not freed before the finalizer of the custom block has been called. But if you go that way, you would have been just as safe if the C pointer had been stored naked in a non-custom block (e.g., a reference) with an opaque type. Indeed, in both cases, the naked pointer is no longer on the OCaml heap once memory is freed, so the scenario you describe cannot occur. Custom blocks being safer than naked pointers is just a myth.

I’m not following the details of the general discussion, but something you can do with custom blocks is that you can zero the pointer stored in the custom block when it is freed. This allows you to fail at runtime if you try to access a freed object via the custom block, instead of segfaulting.

Your comment supports my point. The trick you describe has absolutely nothing to do with custom blocks. You can do the exact same thing (zeroing a pointer) with a non-custom block. Custom blocks are not intrinsically safer than non-custom ones. Just because you store a naked pointer into a custom block does not magically make your program immune to memory corruption. Any extra step you have to take to ensure safety can also be used in the non-custom case.