Ctypes has been my go-to for writing C stubs. However it requires a lot of work in OCaml (allocation, finalization) and the build is far from straightforward since the OCaml code need information that only the C compiler knows.
Camlidl has an approach that is a lot simpler and still customizable: generate from a signature of a C function, C and ML code. Moreover it supports custom_block. It just need to be a little more customizable for some common cases:
initialization: some out parameter need to be initialized by calling a library function before making the C call
context: libraries can require to pass a common datastructure to all functions, like finalization and initialization
reference counting
However Iām wondering if it is not pushing too much the design of Camlidl. What are you using for your binding? Is SWIG useful ? It seems to quickly let you do everything by hand (e.g. arrays with an unknown length).
I tried writing a POC for an OCaml lib that would generate C code, so that it is easy to customize and you get to write in a language with lsp support. But the description algebra is not satisfying yet (but Format with tag is fun) and I donāt want to add a spurious alternative in the ecosystem.
It depends on the design of the C library itself and how you decide to bind to it (and the extent of the API surface area you need to bind to). In general I think itās better to do thin, mostly one-to-one even if unsafe, bindings and make them safe behind a module using OCaml code.
If that works for you then personally I think the bare OCaml FFI is fine. There are a few patterns that should be better described in the manual (mostly for custom values, finalized or not, to store C pointers and different strategies on how to deal with enums) but it ends up being rather straightforward.
Here is a recent example of mine, the surface API is not small but the C API is clean so the amount of glue is minimal and itās mostly bare C calls with the appropriate conversions on arguments.
Unfortunately, I think, camlidl is not extensible enough to have a conversion that use the result and written at the same time. It is perhaps possible with SWIG. With camlid it should be possible, but its a POC.
Do you think such declarative way would be simpler to write and maintain?
I used to use camlidl for a ton of stuff. For the reasons you mention. Then came a day when I wanted to wrapper a large C++ library (Rocksdb). And it had a C API (wrappered around the C++), so I started doing it, and ā¦. wow, that was a lot of work. B/c you had to write so much memory-management code. And the API was complicated, so ā¦. a lot of work.
Now, Rocksdb (descended from Leveldb) has one thing going for it: it adheres to the Google C++ Style Guide calling conventions. So memory ownership is predictable based on method type-signatures. And all memory-allocated objects have well-understood lifetimes (again b/c of adherence to the Style Guide). So itās possible to generate the C stub from the C++ type signature, using template-based metaprogramming.
A long time ago, I did just that, in a little project called ācppffigenā. I used it to wrapper Rocksdb, and it was great. Itās been a long, long time since I hacked on that stuff. I should go update it. Not suggesting you should use any of it, but the the basic idea of using C++ templates as a basis for code-generation is a good one. Iāve noticed that Rust uses its traits/typeclasses in pretty much the same way, for the same purposes.
Maybe, maybe not :ā) But I think Iām generally happier with low level technology. The problem with stub generators is that:
You need to learn (and remember for maintenance) a new language.
You need to make sure the binding patterns you are going to need are supported by the language (e.g. you seem to suggest that one I took in that binding wouldnāt be).
The day there is a problem you need to (re)understand the language and the resulting code generation.
Itās yet another tool you need to make sure gets maintained over the years.
For this reason Iām personally more enclined to low level WYSIWYG technology for FFI bindings (also the approach I took for JavaScript in brr).
While you may pay a higher initial binding cost (though a bit of M-x query-replace-regexp semi automation helps with transforming certain API patterns) Iād say that in the long term itās easier for maintenance, auditing and spotting problems. Itās also more friendly for drive-by contributors and/or reviewer ā if they know the FFI of the language of course.
But then I donāt say thatās always the approach that should be taken. For example for tgls I generate the (ctypes-based) bindings because there are XML files that describe the API.
P.S. The thing you mention you would have abstracted away. I remember in the past metaprogramming bindings with horrendous C macros to be the most DRY as possible. But then faced with a segfault it would become hard to assess the correctness of this dryness. Especially when Iād get the problem a couple of months later and have forgotten about my nice macro-level atrocities :ā)
By the way except for ownerships and callbacks from C this is by far the most annoying thing when binding to C.
For example in tsdl (ctypes-based) I have a C program that generates OCaml code with the constants. Another approach is to use a plain variant and have a mapping table on the C side (example in Unix) with the disavantage that you need to keep that in sync across with the OCaml source. Both approaches are fine if you donāt need the invert map, if you need to pattern match the enum on OCaml (versus just use them to specify stuff to the C API) it becomes extra annoying.
I remember that at some point I was actually pre-processing OCaml sources with cpp and the library C includes in order to get those constants. Quite ugly but it worked Iām no longer sure why I stopped doing that :ā)
Thank you for your comments, it helped me in the development of camlid. An example of the pattern we talk about can be found there.
For enum I have not yet added the helper, but the correspondance C/OCaml would be handled in C with a simple generated switch. For flags, it is also not yet implemented but an unboxed, untagged and noalloc external function lab1:bool -> lab2:bool -> lab3:bool -> lab4:bool -> Int32.t wrapped into a function ?lab1:bool -> ?lab2:bool -> ?lab3:bool -> ?lab4:bool -> unit -> Int32.t seems good enough.
Skimming, it looks like this is mostly for C++ ? (also C, but mostly aimed at C++ ?) If you assume well-behaved C++ (e.g. following the Google C++ style guide), itās much, much simpler to generate bindings; even then, sometimes you have to step in and give the FFI generator some help.
Iāve been convinced for a while (OK, for 10yr) that the right way to do this is to generate bindings against C++ that adheres to that style guide, and for C, hand-write C++ wrappers to that standard first.
Also, if someone passing by would like to write a gobject-introspection binding generator, that would be extremely cool and useful to leverage libraries from the GObject/Gnome ecosystem (such as gstreamer) in OCaml.
GObject introspection is a middleware layer between C libraries (using GObject) and language bindings. The C library can be scanned at compile time and generate metadata files, in addition to the actual native C library. Then language bindings can read this metadata and automatically provide bindings to call into the C library.
Iām doing this, but itās some way away from being ready (hence no announcement from me).
For clarity: this is not a direct port of lablgtk3, but heavily inspired by its architecture and internal design, but using GIR directly. Itās not a full time project and still need significant uplift to get it to a usable and complete state. I appreciate there may be some interest in this but I am not confident in its viability yet.
It aims at C, but it is true the āmethodā that a data-structure can provide are reminiscent of C++ objects. In a way it automatically does what you propose, by automatically generating the boilerplate from a āC++ styleā.
One of the reasons I think that C++ is a good target for FFI bindings, is that C++ templates allow for automatic type-based generation of conversion code. You can see that used to excellent effect in Rust FFI bindings also.
SWIG is powerful but itās somewhat brittle and the generated code is absolutely huge with tons of dynamism and conversions. You can look at some generated code and you should immediately be able to decide if itās the path you want.
I think itās less brittle than it used to be but I think itās mostly due to a lot of whack-a-mole. Issues also still appear and last year it was failing for mcrypto (python), due to the inclusion of a new system header by openssl that had been forgotten before. That header defines _GNU_SOURCE or similar, which in turn includes additional headers, exposes more types and fields, and suddenly swig was trying to bind every system header.