How to keep OCaml races bounded in space and time in the presence of C stubs?

I have a question about the interaction between the OCaml and C (and C++?) memory models when writing OCaml bindings for C code (that is itself bindings for C++ code).

I am thinking about code such as the following:

value llvm_param_types(value FunTy) {
  unsigned Length = LLVMCountParamTypes(clear_low_bit(FunTy));
  value Tys =
    ( Length <= Max_young_wosize
    ? caml_alloc_small(Length, 0)
    : caml_alloc_shr(Length, 0) );
  LLVMTypeRef* LLTys = (LLVMTypeRef *)Op_val(Tys);
  LLVMGetParamTypes(clear_low_bit(FunTy), LLTys);
  for (unsigned I = 0; I < Length; ++I)
    Op_val(Tys)[I] = set_low_bit(LLTys[I]);
  return Tys;
}

The context for this code is @alan’s work (D136400, D136537) to make the LLVM bindings compatible with OCaml 5. Pointers to LLVM objects will be aligned and naked pointers are avoided by setting the low bit when passing them from C to OCaml to make them look like integers to the OCaml GC, and likewise clearing the low bit when passing them from OCaml to C.

This code aims to be a binding for LLVMGetParamTypes and should accept an LLVM function type and return an OCaml array of the LLVM types of its parameters. It allocates a block on the OCaml heap of the needed size, passes a pointer to the first slot to LLVMGetParamTypes which will populate it, and then iterates over the elements to set the low bit of each.

This code violates the FFI rules since a block allocated with caml_alloc_shr is initialized using plain non-atomic C stores rather than going through caml_initialize, and the stores to update with the low bit set are also plain non-atomic C stores rather than using Field.

My understanding is that in a sequential setting this is safe since the OCaml runtime cannot run between the time the allocation is done and when the block if fully initialized with integer values, so the GC cannot see uninitialized or naked pointers, and no cross-generation pointers can be created.

In the context of a concurrent program that has no data races, my understanding is that this is still safe.

But in the context of a concurrent program that has races, my understanding is that even though all the writes are to freshly-allocated not-yet-published memory, the C memory model does not guarantee that e.g. the domain executing this code will read its own writes when setting the low bits. Am I being too pessimistic? I am unsure what should be done in low-level C stubs to ensure that races in OCaml code do not lead to the C stub code pulling in undefined behavior. Would it suffice to make the final store to each location volatile by changing the body of the for loop to:

    Field(Tys, I) = set_low_bit(LLTys[I]);

Or would more be needed?

Any pointers to what I ought to be reading to answer this sort of question would be much appreciated.

Thanks, Josh

I don’t really have an answer to your questions, but a few things seems fishy.
First there are races here: even if your code does not look like your object is published to other threads, the major heap can be scanned outside of poll points. And since you can allocate to the major heap this is a problem. So I think that this code is really unsafe as is.
Also a note: doing the dance around Max_young_wosize seems useless: just use the basic caml_alloc.

What I would recommend is to have a temporary alloca’ed array and copy from it. This would really feel safer.

Also another option, if you build arrays that you know are always going to contain only C pointers, you might just want to store that in strings (behind an abstract type to make it cleaner)

After reading more, including here, here, and here, it seems that this sort of thing is not entirely straightforward, and ultimately relying on C compilers not being overly malicious is involved.

So an approach that seems to be workable is to:

  • Only allow plain, non-atomic, C loads and stores to access memory the OCaml runtime system does not know about.

  • Use volatile loads and stores to access memory the OCaml runtime system may know about.

  • Use Store_field/caml_modify if there is any chance of storing a pointer to a young object.

This is just a coarse set of rules I can follow, but I’m not confident in them since I do not fully understand the reasoning that has gone into the discussions linked above. Any comments would be very welcome.

For the example in the original post, this could be done with the following code:

value llvm_param_types(value FunTy) {
  unsigned Length = LLVMCountParamTypes(clear_low_bit(FunTy));
  LLVMTypeRef *Temp = malloc(sizeof(LLVMTypeRef) * Length);
  if (Temp == NULL)
    caml_raise_out_of_memory();
  LLVMGetParamTypes(clear_low_bit(FunTy), Temp);
  value Tys = caml_alloc(Length, 0);
  for (unsigned I = 0; I < Length; ++I)
    Field(Tys, I) = set_low_bit(Temp[I]);
  free(Temp);
  return Tys;
}

(Here Temp is allocated with malloc just to avoid any potential questions about how much memory is too much to stack-allocate, but could be optimized.)

I don’t understand the comment about the original version being racy. In particular, when I read the implementation of caml_alloc it is not clear to me why it is more safe than the example I posted above. My understanding is that if a block is allocated on the major heap, then the GC will not scan it before it is reachable from some GC root. It seems that caml_alloc relies on this as it allocates uninitialized memory and then loops over it and initializes each element to Val_unit.

So it seems that it should be safe to replace

  value Tys = caml_alloc(Length, 0);

with

  value Tys =
    ( Length <= Max_young_wosize
    ? caml_alloc_small(Length, 0)
    : caml_alloc_shr(Length, 0) );

to avoid the cost of double-initialization.

A goal of the bindings is for the OCaml client code to be able to operate on the individual values in the array, passing them back to the Llvm library individually. So unless I have misunderstood the suggestion, I don’t think that an encoding using strings would work.

I don’t understand the comment about the original version being racy. In particular, when I read the implementation of caml_alloc it is not clear to me why it is more safe than the example I posted above. My understanding is that if a block is allocated on the major heap, then the GC will not scan it before it is reachable from some GC root. It seems that caml_alloc relies on this as it allocates uninitialized memory and then loops over it and initializes each element to Val_unit.

This is also my understanding, you are essentially initializing the block. Maybe the memory model documentation would benefit from being more precise about what counts as an initialization.

1 Like