[ANN] A dynamic checker for detecting naked pointers

As @gasche had mentioned earlier, the no-naked-pointers mode was already there in OCaml and it is known to work on all the platforms that OCaml was supported. Hence, it was a reasonable path to pursue for Multicore.

The concurrent minor collector in Multicore OCaml uses the virtual address space trick, but only for the minor heap area. It needs contiguous 4GB reserved for 128 domains, each with max 16MB minor heap arena. This can be modified at compiler configure time. For comparison the minor heap is 2MB by default in OCaml and so 16MB should be quite enough. We hadn’t considered this trick for the major heap in Multicore.

However, given our experimental evaluation (see paper), we have chosen not to pursue concurrent minor collector for the initial version of multicore support to be upstreamed. The alternative stop-the-world parallel minor collector scales better and does not break the C FFI. The parallel minor collector does not need the virtual address space trick.

Given that the space for the entire heap should be reserved, how would it work on 32-bit architectures, and does it have an impact on system tooling. Looking forward to reading @gadmm’s RFC.

1 Like

The minor heap in Coq is 256MB, if I remember correctly. Given the way OCaml and Coq have evolved, I do not know if such a huge minor heap is still warranted nowadays. But it certainly was a much needed change back then.

Also, I just checked with Why3. While a 256MB minor heap does not seem that useful, the tool would certainly benefit from 32MB or 64MB.

As always, mileages and all that.

1 Like

Interesting. I didn’t know Coq uses 256MB minor heap. I would be interested in seeing benchmarks that benefit from large minor heaps. Multicore recently gained the ability to compile Coq, and we are working on adding Coq benchmarks to Sandmark.

To be clear, the current minor GC scheme in Multicore is the stop-the-world parallel collector which does not place a size restriction on the minor heap. So large minor heaps should work. However, the Multicore OCaml implementation still carries vestiges of the concurrent minor collector and will prevent you from creating large heaps, but there’s no technical reason why it should.

About Sandmark: Are you taking any relevant workloads? Is there anything else to do than opening a PR on the repo? Any additional procedure to follow? Are you looking for specific kinds of benchmark?

I’m considering contributing a benchmark for data-encoding (serialising/deserialising library).

Hi @raphael-proust. We don’t have any specific requirements. Please go ahead!

One caveat is that all of our benchmarks are CPU intensive. We’ve spent a bunch of time thinking about eliminating noise for CPU intensive workloads. We’ve not thought about I/O much. Is data-encoding going to be I/O intensive?

Please do note that this isn’t a performance improvement for OCaml – this very much a correctness fix. The failure case is as follows:

  • a naked pointer is created using malloc on the C heap and held in the OCaml heap
  • the external region is free'd, but the naked pointer is still held in some OCaml heap.
  • the GC mallocs to expand, and that recently freed C memory becomes part of the OCaml heap
  • the GC then follows the naked pointer by treating it as an OCaml value, since the page table indicates that it is within the OCaml heap. However, the memory the naked pointer is aimed at is not necessarily a valid OCaml value as it was formerly a C pointer.
  • memory corruption ensues

The only way to really avoid this is by only holding naked references to static or global C values, which is a pretty minority usecase. As @lpw25 notes, you can hold them safely by wrapping them in custom blocks, which is entirely safe as it gives the GC a reliable way to determining what’s going on.

As for the question about a contiguous VA, this should work fine on 64-bit, where you have the luxury of such use of the address space. I built a version of this a decade ago for OCaml/Xen in early Mirage, which you can find evaluated in the HotCloud 2010 paper (Figure 4). It’s pretty straightforward, but the problems come from balancing external memory pressure (from C allocations) with the OCaml allocation. This can be adjusted with an obvious use of sbrk or realloc to grow or shrink the contiguous memory, while being careful to keep other memory allocations away from the OCaml area.

The current strategy will need to be maintained for 32-bit architectures however, which are very much supported (e.g. armv7). For those, there is very little wiggle room to hold a contiguous VA and so the current multicore approach lets us preserve a unified memory representation.

One observation I had when I read @stephenrkell’s excellent essay is how strange our current memory allocation mechanisms are in operating systems. We have conflated cooperative scheduling across components with enforcing protection from mutually untrusted control flow in the same language. For example, we have the system C malloc competing with the OCaml GC which competes with the kernel memory allocator. I’ve been sketching out a possible solution in multicore OCaml towards this:

  • We move away from Bigarray to a specialised Extvalue that handles external pages in a separate region of memory. Bigarray currently offers too much functionality (subarrays and proxies) which slows it down due to dropping into the C FFI.
  • The Extvalue is backed by a bundled slab allocator that works in a contiguous region of memory, disjoint from the OCaml heap.
  • The compiler provides primitives for very fast translation of values in and out of the Extvalue (as it does currently for Bigarray).
  • C libraries linked in with OCaml also use this memory allocator for their own mallocs. This will require some trickery (static compilation or LD_PRELOAD initially), but it means that all the allocations associated with a particular “task” (from OCaml to C or Rust code) can be batched together.
  • This approach lets us improve multicore memory locality greatly, as every modern machine has significant NUMA effects (see this FOSDEM 2013 talk), and cooperatively allocate memory. It also leaves open the possibility of separate isolation mechanisms (such as ARM memory domains or Intel MPK) across tasks in a large heap.

Please note that the above is still only at the experimental stage as I’m still evaluating it, but it does have the advantage of degrading gracefully if the system malloc has to be used (e.g. if OCaml is embedded as a library, noone expects 10GBs gigabit levels of network performance). From an ecosystem perspective, I don’t think anyone really wants to maintain the current hybrid world of a multitude of Bigarray-based overlays, such as cstruct or bigstring.

3 Likes

The main thing that the library does is take arbitrarily complex/big values and serialise them into bytes or json. So it’d be memory-access intensive, but no files/sockets/etc.

1 Like

Please do submit that. It would handy if it had a mode to do a “loopback” test over a localhost socket to stress that part, but it’s already useful to have a test which is a combination of GC pressure and memory access, and CPU.

1 Like

Writing that it is “entirely safe” is an overstatement. You still need to make sure that the pointer, whether naked or wrapped in a custom block, is no longer accessible from the OCaml code once the memory has been freed. For example, you can make sure that the memory is not freed before the finalizer of the custom block has been called. But if you go that way, you would have been just as safe if the C pointer had been stored naked in a non-custom block (e.g., a reference) with an opaque type. Indeed, in both cases, the naked pointer is no longer on the OCaml heap once memory is freed, so the scenario you describe cannot occur. Custom blocks being safer than naked pointers is just a myth.

I’m not following the details of the general discussion, but something you can do with custom blocks is that you can zero the pointer stored in the custom block when it is freed. This allows you to fail at runtime if you try to access a freed object via the custom block, instead of segfaulting.

Your comment supports my point. The trick you describe has absolutely nothing to do with custom blocks. You can do the exact same thing (zeroing a pointer) with a non-custom block. Custom blocks are not intrinsically safer than non-custom ones. Just because you store a naked pointer into a custom block does not magically make your program immune to memory corruption. Any extra step you have to take to ensure safety can also be used in the non-custom case.

I’m afraid I don’t understand. I posted a specific unsafety with naked pointers above that are avoided by wrapping them in custom blocks, so the GC doesn’t follow them. I’m not claiming anything about the memory safety when actually using a C library – for that we have techniques such as those in ocaml ctypes.

Could you please post an example of how GC memory corruption might occur when the pointer is wrapped in a custom block?

Sure. Just call free on the C memory block while the custom block containing the pointer is still reachable from the OCaml code. See below for the full argument.

As far as I can tell, your example is flawed.

Your scenario assumes that there is a custom block containing an already freed pointer. First, let us assume that the custom block is still reachable from OCaml code. In particular, it can still be passed to C functions. Unless your C functions have a way to detect that the pointer stored in the block is invalid, this is the classical use-after-free memory corruption. Thus, the code is unsafe, and using a custom block does not change anything to the issue.

So, if the code using custom blocks is safe, it means that we cannot be in that case. In particular, it means that the C memory has necessarily been freed after the custom block has become unreachable. For example, I tend to use finalizers to ensure this kind of temporal property.

But my knowledge of OCaml is lacking. So, I might be missing something. Which trick do you use that works for custom blocks containing a pointer to malloced memory but would not work if the block was non-custom?

Let me put it more formally. I argue that any sane C code out there that calls caml_alloc_small(sz, Custom_tag) and then stores a pointer to malloced memory into it could be changed to use a different tag with no adverse effect. Could you please show me some non-artificial C code, where changing Custom_tag to some other tag, e.g., 42, would cause the memory corruption scenario you describe?

@silene and @nojb discuss the usual resource-safety (no use-after-free, etc.), which have to do with programming bugs. It is true that naked pointers and custom blocks are similar from that point of view. Custom and abstract blocks are one way of fixing the issue mentioned by @avsm, which is of a different nature (it concerns what is reachable from the GC, not what the program written by the user tries to access).

1 Like

But that is my point. Since OCaml’s GC can run at any time, the difference between what is reachable from the GC and what is reachable from the user does not matter much here. If the code has been made safe with respect to user accesses, then I argue that it is also safe with respect to GC accesses. You have to go to great lengths to be safe from user accesses yet be unsafe from GC accesses.

For example, you have to store into your block both a pointer and a boolean saying whether the pointer is valid. But this is completely artificial. In any sane code, both fields would have been conflated into a potentially null pointer.

I don’t think that’s true. You could have a record with pointers to the C heap, when you no longer want to reference these C pointers from OCaml (e.g. they may be managed by the C part of the program) you simply dereference that record from the program. The GC however will have to collect that record at which point you may hit the problem mentioned above (which has been described in the manual here for as long as I have been FFIng OCaml with C I think) – if your pointers are naked.

1 Like

Let me ask it again. For the memory corruption to occur, you need the C code to call free between the time the OCaml code stopped referencing the record and the time the GC ran. How does your C code know that calling free at that time was safe? There is no magic. Either the user or the GC had to tell the C code one way or another that it was now safe to free the memory. If it is the GC (through a finalizer), then no memory corruption can occur, since the block will not be scanned. If it is the user, then the C code has to take some extra care to make sure that the user is not keeping a copy of the now invalid pointer around, for example by setting the pointer to null. Again, no memory corruption can occur, since the scanned block will contain only a null pointer.

Not necessarily, maybe these pointers were simply pointing on C substructures owned by another C structure whose free is in charge of freeing the formers.

Another example where pointers are not managed by C but by you is an OCaml record with an immutable pointer field to an associated C structure. You dereference that record from the OCaml program and then free the C pointer without taking care to mutate the pointer of the record to NULL (it’s immutable you can’t do that). That’s safe for the user but it’s not for the Gc if your pointer is naked.

1 Like

Another example is code that:

  1. mallocs some memory,
  2. creates some OCaml values containing naked pointers into that memory,
  3. computes some pure OCaml value from them,
  4. concludes algorithmically that the malloced memory will never be used again and so,
  5. frees it,
  6. returns the pure OCaml value result and compute some more

Since it is possible for the current OCaml GC to grow its heap into the just-freed memory, it is necessary at step 4 to additionally ensure that all the values created at step 2 are dead and to call Gc.full_major so that no naked pointers become dangling by the call to free. In a way this falls under the umbrella of ensuring safety, but I think that the need to consider creating dangling pointers from dead but uncollected values is something that is easily overlooked and gotten wrong. It involves a mindset for manual memory management that is more involved than for plain C code.

1 Like

If I understand them correctly, none of these two examples are memory-safe from a user point of view. If the OCaml user were to call the proposed functions in a different order, a memory corruption would occur, irrespective of the GC. So, my claim still stands: If a set of functions is safe with respect to user accesses (no use-after-free whatsoever), it will also be safe with respect to GC accesses.

This one is a lot more convincing, since it contains a single function. Thanks. It necessitates some tight interplay between C and OCaml code (otherwise you would not be able to conclude anything about the safety) . I do not remember having ever seen this pattern in practice. But I can imagine how some code could end up looking like that.