I have a question about the interaction between the OCaml and C (and C++?) memory models when writing OCaml bindings for C code (that is itself bindings for C++ code).
I am thinking about code such as the following:
value llvm_param_types(value FunTy) {
unsigned Length = LLVMCountParamTypes(clear_low_bit(FunTy));
value Tys =
( Length <= Max_young_wosize
? caml_alloc_small(Length, 0)
: caml_alloc_shr(Length, 0) );
LLVMTypeRef* LLTys = (LLVMTypeRef *)Op_val(Tys);
LLVMGetParamTypes(clear_low_bit(FunTy), LLTys);
for (unsigned I = 0; I < Length; ++I)
Op_val(Tys)[I] = set_low_bit(LLTys[I]);
return Tys;
}
The context for this code is @alan’s work (D136400, D136537) to make the LLVM bindings compatible with OCaml 5. Pointers to LLVM objects will be aligned and naked pointers are avoided by setting the low bit when passing them from C to OCaml to make them look like integers to the OCaml GC, and likewise clearing the low bit when passing them from OCaml to C.
This code aims to be a binding for LLVMGetParamTypes
and should accept an LLVM function type and return an OCaml array of the LLVM types of its parameters. It allocates a block on the OCaml heap of the needed size, passes a pointer to the first slot to LLVMGetParamTypes
which will populate it, and then iterates over the elements to set the low bit of each.
This code violates the FFI rules since a block allocated with caml_alloc_shr
is initialized using plain non-atomic C stores rather than going through caml_initialize
, and the stores to update with the low bit set are also plain non-atomic C stores rather than using Field
.
My understanding is that in a sequential setting this is safe since the OCaml runtime cannot run between the time the allocation is done and when the block if fully initialized with integer values, so the GC cannot see uninitialized or naked pointers, and no cross-generation pointers can be created.
In the context of a concurrent program that has no data races, my understanding is that this is still safe.
But in the context of a concurrent program that has races, my understanding is that even though all the writes are to freshly-allocated not-yet-published memory, the C memory model does not guarantee that e.g. the domain executing this code will read its own writes when setting the low bits. Am I being too pessimistic? I am unsure what should be done in low-level C stubs to ensure that races in OCaml code do not lead to the C stub code pulling in undefined behavior. Would it suffice to make the final store to each location volatile
by changing the body of the for
loop to:
Field(Tys, I) = set_low_bit(LLTys[I]);
Or would more be needed?
Any pointers to what I ought to be reading to answer this sort of question would be much appreciated.
Thanks, Josh