[BLOG] A (Possibly) Safer Interface to the Ctypes FFI

Hi there, another blog post.

This time I discuss ideas for a new interface that helps localize the possibilities of errors when working with a Ctypes-style FFI. Comment below if you like/hate it please!

Matt

5 Likes

I’m not sure why we need to manage memory allocation on the OCaml side for Ctypes.

I can see a potential performance advantage, but a hand written C binding would copy the string internally in the stub if needed, and you’d have a safe way to deal with lifetimes: the CAMLparam macro that registers it as a GC root, and you pass the OCaml value itself as an argument (not an unsafe pointer to something held by another OCaml value).

It’d be good if the manual memory management was opt-in in Ctypes, and there was a default interface that is safer and doesn’t require handling of raw pointers.
At least when the C stubs mode is used to generate C bindings. I can see why it’d be needed in the libffi mode.


Small comment on your approach: you could try to use a tree instead of a list to avoid having to concatenate lists in bind. All we need here is to keep the values alive, so a tree of tuples might work too if you add another case to your GADT?

At least when the C stubs mode is used to generate C bindings. I can see why it’d be needed in the libffi mode.

This is indeed talking about the libffi mode. I can’t use the C stubs mode for the bindings I’m making because I have to compile the OCaml code as an .so and then call a function that is passed to me in order to access Godot API function pointers.

Regardless, I am a little perplexed: If a C binding is passed an OCaml string or some other array/pointer type that has been allocated using allocate, I have observed the crashes mentioned in the blog post. Are you saying in C stubs mode these crashes would not occur? My understanding was that they have to do with the OCaml garbage collector not knowing it can’t throw away the off-heap memory they have a pointer to, since C might have a pointer to it to.

I can see a potential performance advantage, but a hand written C binding would copy the string internally in the stub if needed, and you’d have a safe way to deal with lifetimes: the CAMLparam macro that registers it as a GC root, and you pass the OCaml value itself as an argument (not an unsafe pointer to something held by another OCaml value).

Indeed, though these are not handwritten C bindings and so CAMLparam is not available to me.

Regarding my approach (forgot to mention) I like the idea of using a tree. It would in fact be much easier than you’d think: since Dep : 'a -> dep, I can simply apply Dep to a tuple!

1 Like

I mean that in C stubs mode Ctypes could generate code that is safer, i.e. more similar to how the hand-written function would look like.
But because C stubs mode and libffi mode has the same interface I don’t think it is currently possible. (unless you abandon libffi mode, but as you say that mode is useful too).

In particular it’d be good if some solution could be found in Ctypes itself, to ensure that the ptr type holds references to the OCaml value that “holds” the memory allocated, and that this is registered as a root on the C side. Looking at the current implementation it does seem to contain some code to track this, but perhaps this is not complete when used in FFI mode.

Although if you use libffi mode then there isn’t any “C side” to register the roots on, perhaps the @-> operator could do the kind of dependency tracking that you implemented in the living module? (e.g. build up a nested tuple of all the arguments, so we hold it all alive while the call is running?).

Although if you use libffi mode then there isn’t any “C side” to register the roots on, perhaps the @-> operator could do the kind of dependency tracking that you implemented in the living module? (e.g. build up a nested tuple of all the arguments, so we hold it all alive while the call is running?).

The problem is unfortunately worse than this. It can occur after the call returns, since C can return a pointer into the structure you pass in, or store the data you pass in, and so I don’t think can be fixed in directly in Ctypes without (likely unwanted) overhead.

Perhaps there should be a way to declare this at the time the function’s signature is defined.
Not all functions work this way, it is only needed when the function returns some kind of pointer, and in that case it could be declared that it depends on one of the arguments when you define it:

let arg_returned = returned_ref <.... some type > in
... @-> arg_returned @-> returning (references arg_returned <type>)

(So you don’t need to keep everything alive, just the argument that holds the memory).

ptr already appears to be capable of tracking dependencies, the CPointer field has an ‘Obj.t option’ field, so the overhead may not be quite so big to keep track of this.

Perhaps this could be initially prototyped using your Living module, and eventually backported into Ctypes itself if we find a way to ensure lifetime safety.

This is definitely an option, but requires you to keep the return value alive then, and would only work with returning a pointer. For example, something returning unit would not work with this approach, like a void store_string(const char* my_string) function on the C side that simply stores my_string in some static and opaque structure.

I like the idea of using Living for prototyping though.

I thought that’s what Ctypes fat pointers are for? (But I haven’t used them so far so just guessing.)

In that case I’d usually copy the string, unless I want to very precisely match the lifetime of C data structures with OCaml values (e.g. by using finalizers and Custom tags).
You’re right that the lifetime annotations can become more complicated than just “the return value uses this”, there should probably also be stores in which case probably a copy should be made, unless the lifetime of the OCaml value is known to exceed it (e.g. perhaps referenced from a global).

The more I look at this it looks like we’d end up with something similar to Rust’s lifetime annotations.
But perhaps we could use modality (Jane Street Tech Blog - Oxidizing OCaml: Locality) instead of lifetime annotations.
If we’d wrap a C argument with (global argtype) then the lifetime might be global (or at least exceed that of the current caller), and Ctypes should copy all arguments that aren’t global.
On the other if we wrap it with

let scoped_arg = scope argtype in
let scoped_arg2 = scope ~ref:scoped_arg argtype2 in
let scoped_ret = scope ~ref:scoped_arg2 in
foreign "func1" @-> scoped_arg -> scoped_arg2 -> float -> returning scoped_ret 

then argument2 needs to be alive as long as arg1 is alive, and arg2 needs to be alive as long as the return value is. This probably needs list/tree building similar to the Living module.

Although this is still very fragile, it relies on the user to declare the lifetimes (e.g. perhaps on a C manual page), and there are no safeguards if you’d get it wrong.
OTOH this could be an opt-in “high-performance” interface, and a safer, default interface could be to always copy the arguments (where that is possible, obviously you wouldn’t want to copy a possibly gigabyte large array…)

I’ve written a blog post on how living currently handles this situation. Please take a look. [BLOG] A Tour of the Living Library -- A Safer FFI. I think it’s the best you can do without linear/affine types.