[ANN] A dynamic checker for detecting naked pointers

Writing that it is “entirely safe” is an overstatement. You still need to make sure that the pointer, whether naked or wrapped in a custom block, is no longer accessible from the OCaml code once the memory has been freed. For example, you can make sure that the memory is not freed before the finalizer of the custom block has been called. But if you go that way, you would have been just as safe if the C pointer had been stored naked in a non-custom block (e.g., a reference) with an opaque type. Indeed, in both cases, the naked pointer is no longer on the OCaml heap once memory is freed, so the scenario you describe cannot occur. Custom blocks being safer than naked pointers is just a myth.

I’m not following the details of the general discussion, but something you can do with custom blocks is that you can zero the pointer stored in the custom block when it is freed. This allows you to fail at runtime if you try to access a freed object via the custom block, instead of segfaulting.

Your comment supports my point. The trick you describe has absolutely nothing to do with custom blocks. You can do the exact same thing (zeroing a pointer) with a non-custom block. Custom blocks are not intrinsically safer than non-custom ones. Just because you store a naked pointer into a custom block does not magically make your program immune to memory corruption. Any extra step you have to take to ensure safety can also be used in the non-custom case.

I’m afraid I don’t understand. I posted a specific unsafety with naked pointers above that are avoided by wrapping them in custom blocks, so the GC doesn’t follow them. I’m not claiming anything about the memory safety when actually using a C library – for that we have techniques such as those in ocaml ctypes.

Could you please post an example of how GC memory corruption might occur when the pointer is wrapped in a custom block?

Sure. Just call free on the C memory block while the custom block containing the pointer is still reachable from the OCaml code. See below for the full argument.

As far as I can tell, your example is flawed.

Your scenario assumes that there is a custom block containing an already freed pointer. First, let us assume that the custom block is still reachable from OCaml code. In particular, it can still be passed to C functions. Unless your C functions have a way to detect that the pointer stored in the block is invalid, this is the classical use-after-free memory corruption. Thus, the code is unsafe, and using a custom block does not change anything to the issue.

So, if the code using custom blocks is safe, it means that we cannot be in that case. In particular, it means that the C memory has necessarily been freed after the custom block has become unreachable. For example, I tend to use finalizers to ensure this kind of temporal property.

But my knowledge of OCaml is lacking. So, I might be missing something. Which trick do you use that works for custom blocks containing a pointer to malloced memory but would not work if the block was non-custom?

Let me put it more formally. I argue that any sane C code out there that calls caml_alloc_small(sz, Custom_tag) and then stores a pointer to malloced memory into it could be changed to use a different tag with no adverse effect. Could you please show me some non-artificial C code, where changing Custom_tag to some other tag, e.g., 42, would cause the memory corruption scenario you describe?

@silene and @nojb discuss the usual resource-safety (no use-after-free, etc.), which have to do with programming bugs. It is true that naked pointers and custom blocks are similar from that point of view. Custom and abstract blocks are one way of fixing the issue mentioned by @avsm, which is of a different nature (it concerns what is reachable from the GC, not what the program written by the user tries to access).

1 Like

But that is my point. Since OCaml’s GC can run at any time, the difference between what is reachable from the GC and what is reachable from the user does not matter much here. If the code has been made safe with respect to user accesses, then I argue that it is also safe with respect to GC accesses. You have to go to great lengths to be safe from user accesses yet be unsafe from GC accesses.

For example, you have to store into your block both a pointer and a boolean saying whether the pointer is valid. But this is completely artificial. In any sane code, both fields would have been conflated into a potentially null pointer.

I don’t think that’s true. You could have a record with pointers to the C heap, when you no longer want to reference these C pointers from OCaml (e.g. they may be managed by the C part of the program) you simply dereference that record from the program. The GC however will have to collect that record at which point you may hit the problem mentioned above (which has been described in the manual here for as long as I have been FFIng OCaml with C I think) – if your pointers are naked.

1 Like

Let me ask it again. For the memory corruption to occur, you need the C code to call free between the time the OCaml code stopped referencing the record and the time the GC ran. How does your C code know that calling free at that time was safe? There is no magic. Either the user or the GC had to tell the C code one way or another that it was now safe to free the memory. If it is the GC (through a finalizer), then no memory corruption can occur, since the block will not be scanned. If it is the user, then the C code has to take some extra care to make sure that the user is not keeping a copy of the now invalid pointer around, for example by setting the pointer to null. Again, no memory corruption can occur, since the scanned block will contain only a null pointer.

Not necessarily, maybe these pointers were simply pointing on C substructures owned by another C structure whose free is in charge of freeing the formers.

Another example where pointers are not managed by C but by you is an OCaml record with an immutable pointer field to an associated C structure. You dereference that record from the OCaml program and then free the C pointer without taking care to mutate the pointer of the record to NULL (it’s immutable you can’t do that). That’s safe for the user but it’s not for the Gc if your pointer is naked.

1 Like

Another example is code that:

  1. mallocs some memory,
  2. creates some OCaml values containing naked pointers into that memory,
  3. computes some pure OCaml value from them,
  4. concludes algorithmically that the malloced memory will never be used again and so,
  5. frees it,
  6. returns the pure OCaml value result and compute some more

Since it is possible for the current OCaml GC to grow its heap into the just-freed memory, it is necessary at step 4 to additionally ensure that all the values created at step 2 are dead and to call Gc.full_major so that no naked pointers become dangling by the call to free. In a way this falls under the umbrella of ensuring safety, but I think that the need to consider creating dangling pointers from dead but uncollected values is something that is easily overlooked and gotten wrong. It involves a mindset for manual memory management that is more involved than for plain C code.

1 Like

If I understand them correctly, none of these two examples are memory-safe from a user point of view. If the OCaml user were to call the proposed functions in a different order, a memory corruption would occur, irrespective of the GC. So, my claim still stands: If a set of functions is safe with respect to user accesses (no use-after-free whatsoever), it will also be safe with respect to GC accesses.

This one is a lot more convincing, since it contains a single function. Thanks. It necessitates some tight interplay between C and OCaml code (otherwise you would not be able to conclude anything about the safety) . I do not remember having ever seen this pattern in practice. But I can imagine how some code could end up looking like that.

I never said this had to happen via two different functions.

@stephenrkell @kayceesrk I have a draft here, not yet submitted, if you are curious: https://github.com/gadmm/RFCs/blob/interop/rfcs/interop.md. I still have to format it, decide where to post it (maybe the RFC repo, maybe not; I’ll read again the repository guidelines tomorrow, my goal is to have a discussion around an evolving document).

That is right. But that was my assumption all along. So I stand corrected: When the C code does not need to account for an adversarial OCaml user, naked pointers can easily cause a memory corruption at collection time while custom blocks are entirely safe.

3 Likes

Just for info, this scenario is entirely not contrived. I work with some code that follows this pattern, where the steps are roughly

  • input file name containing llvm bitcode
  • ask libllvm to read it, receiving a pointer to its in-memory data structure representation
  • use the llvm bindings to translate that data structure to a pure OCaml representation
  • force a GC to remove all the dead iterators, memoization tables, etc. that contain naked pointers
  • let libllvm free its representation
  • proceed computing with the pure OCaml representation

Short of rewriting the llvm bindings, I am not aware of a better / more future-proof implementation strategy. Even ignoring any performance concerns about allocating a block for every one of the very many pointers passed from libllvm to OCaml.

Are they? Some instances of naked pointers have been reported, and the tool to detect naked pointers has only just been released. Important examples includes the usage made by the ancient and netmulticore (ocamlnet) libraries, which was reported early on, and which has no replacement whatsoever in the no-naked-pointers world.

Concerning safety, it is instructive to look at how they avoid the issue of out-of-heap pointers that become in-heap after the heap grows. Ancient uses its own allocator (mmalloc) while netmulticore writes to a shared memory object. Both ultimately come down to allocating with mmap, instead of using the system allocator like OCaml does. By looking at the code, I think that netmulticore is programmed to avoid the issue (only the main process frees some mapping, after the children exit), whereas for Ancient I think it is improbable if not impossible that it happens thanks to mmalloc, where in the former case the fix would be very simple: just keep address space reserved (i.e. instead of munmap use mprotect with PROT_NONE flag, and maybe also madvise with MADV_FREE or MADV_DONTNEED). Note that one can take an off-the-shelf high-performance allocator that offers this feature (jemalloc with option opt.retain).

The reason, I believe, why the issue is a reality concerning C interoperability in current OCaml is that OCaml gets its memory chunks from the system allocator, which is designed to recycle memory.

Concerning maintainer time, I will just note that the no-naked-pointers mode was far from ready for the removal of the page table at the moment the decision to drop naked pointers appears to have been taken, and it took engineering time away with a few PRs, especially regarding the representation of closures which is still not finished, an aspect which by itself slowed down the multicore development because the solution to circumvent the absence of test for in-heap pointers was brittle. Then, once naked pointers are removed, it will be the turn of the community to deal with the consequences, which beyond engineers include academics and open source volunteers, who are not paid for that job and who (I hope) have better things to do. I make my point in more detail in my draft, but I do not think proper care has been taken to evaluate the consequences and to find alternatives.

This reply surprises me since it is the first time I see such a claim proposed to the scientific community (not the failure case, but the reason to remove naked pointers). It appeared in none of the PRs, issues, discuss threads, caml-list messages, etc., from the past 5 years that I have read in detail in the recent weeks. The claim in the last sentence is incorrect, and I would have liked to have a chance sooner to explain what ancient and netmulticore do for this problem.

On the other hand, there have been many mentions on the caml-list and on this discuss showing interest in naked pointers à la ancient and netmulticore, from OCaml users and from scientists. See for instance here and my own.

With my draft proposal, you could remove the hack “force a GC to remove naked pointers”, and you could use jemalloc as your malloc with option opt.retain to prevent its address space from being reused (if there is any risk).

As an update, what I want to do next with the draft is to see if I can fit any of it within the page limit of the ML workshop and to submit it there.

If allocation overhead is an issue, you could tag the out-of-heap pointers to look like integers, if your out-of-heap pointers were aligned. OCaml GC doesn’t follow integers and the implementation would be sound. Of course, this presupposes that you find all the places where the naked pointers are. Hoping that this checker would be useful for that.

Are they?

Yes. Two libraries that have not been actively used or developed in at least half a decade, one of which uses naked pointers unsafely reinforces my point, it does not counter it. Also, IIUC ancient can work fine without naked pointers – it just needs to mark blocks black as it copies them out of the heap.

Disagreeing on a point of design is one thing. Alleging a lack of proper care is something else.