I’m a bit confused by the official OCaml documentation regarding C interfaces.
Rule 1 of “living in harmony with the GC” is:
A function that has parameters or local variables of type value must begin with a call to one of the CAMLparam macros and return with CAMLreturn, CAMLreturn0, or CAMLreturnT.
However, the same documentation proposes this translator, which receives a parameter of type value but doesn’t follow this rule:
/* Extract the pointer encapsulated in the given OCaml value */
static ty * typtr_of_val(value v)
{
return *((ty **) Data_abstract_val(v));
}
Does the rule only apply to primitives or am I missing something else?
The rule applies to any function that invokes the GC directly or indirectly (eg allocates). If you can guarantee that this is not the case, then the macros are not necessary. However, when in doubt, you can always use the macros, as they won’t cause any problems if they are added to a function which did not strictly speaking need them.
Agree that it is best to default to using the macros. The one point I know where it is not benign to add them is in code that can be called by the custom operations of custom blocks, see section 9.1. Such operations run at “inopportune” times and must not register GC roots.
Well actually, slightly more confusion, in the same example in the manual:
/* Create an OCaml value encapsulating the pointer p */
static value val_of_typtr(ty * p)
{
value v = caml_alloc(1, Abstract_tag);
*((ty **) Data_abstract_val(v)) = p;
return v;
}
This interacts with the gc (as it allocates) but does not make use of the macros…
Is it a kind of low-level alloc that doesn’t require the macros? In general, I think it would be helpful if the ocaml manual would say what these macros do. Even just a high-level intuition. It spends time saying where I should write them, but doesn’t quite say why. This leaves the reader with a lack of understanding of what they are doing. I’d be happy to write this down but I’m not sure where I could find knowledge about this
But note that that function does not have a value parameter (and no local value is live across an allocation).
What these macros do is register their arguments as GC roots, and the CAMLreturn macros unregister them. The low-level interface described in the manual gets into this a bit more, but my impression is that there is an overall desire not to just dump the implementation into the documentation.
FWIW I remember a newcomer being annoyed at being told to use these macros without understanding why she had to. So I think it’s worth to add something.
The exact condition is that any variable of value type (parameter, local variable, intermediate result) for which there is at least one GC invocation between its definition and its use must be registered with the GC (using CAMLparam, CAMLlocal, etc). This is because whenever the GC runs it may move heap-allocated blocks around, and so it must know about all pointers to such blocks in order to update them with new addresses.
However, it is easy to make mistakes when computing this condition, so the usual suggestion is to register everything if there is at least one GC invocation in the function.
In this example, there is no GC invocation between the definition of v and its use, so no GC registration is needed.
The definition of the root registration macros CAMLlocal, CAMLparam are in memory.h:
What they do is very simple: the runtime keeps a global linked list of roots (= pointers to heap-allocated blocks) it knows about. CAMLparam* and CAMLlocal* macros simply push elements to this list (using it as a stack), and CAMLreturn pops them out.
(Just making sure I’m catching this right)
This is only a problem because caml_copy_string(s1) and caml_copy_string(s2) are given back to OCaml, and therefore there is a chance for the GC to be triggered right?
Not exactly. The issue is that caml_copy_string allocates memory from the GC heap, so the second application of that function may trigger a collection which moves the string returned by the first application. (The order of evaluation of the arguments of caml_callback2_exn, or any other function, is unspecified, but is in fact right to left, if that is of interest.)
Edit: On order of evaluation of arguments, as I mentioned the OCaml compiler in fact evaluates right to left (but that could change), but the example given is C code. In C the order of evaluation of arguments is also unspecified, but gcc I believe evaluates left to right. This doesn’t affect the outcome, which is that in the example given at least one of the strings passed to caml_callback2_exn should be protected by a CAMLlocal macro.
Ah, I didn’t realise caml_copy_string would allocate memory, but looking at its implementation it does. I wonder if it’s possible to add some kind of doc-string on the header files so that it appears in IDE when hovering the function. I’ll add it to my list of things to look at
It would certainly be an improvement if the examples in the manual religiously followed the rules (i.e. all the examples used the CAMLparam macros without exception), with perhaps a note to explain that the OCaml runtime and codebase is not always the best example for good practice (because it lives on the “wild side”, sometimes for performance reasons and sometimes owing to the code predating the macros).