C Stub : Sometimes the Old Ways Are Best?

Ctypes has been my go-to for writing C stubs. However it requires a lot of work in OCaml (allocation, finalization) and the build is far from straightforward since the OCaml code need information that only the C compiler knows.

Camlidl has an approach that is a lot simpler and still customizable: generate from a signature of a C function, C and ML code. Moreover it supports custom_block. It just need to be a little more customizable for some common cases:

  • initialization: some out parameter need to be initialized by calling a library function before making the C call
  • context: libraries can require to pass a common datastructure to all functions, like finalization and initialization
  • reference counting

However I’m wondering if it is not pushing too much the design of Camlidl. What are you using for your binding? Is SWIG useful ? It seems to quickly let you do everything by hand (e.g. arrays with an unknown length).

I tried writing a POC for an OCaml lib that would generate C code, so that it is easy to customize and you get to write in a language with lsp support. But the description algebra is not satisfying yet (but Format with tag is fun) and I don’t want to add a spurious alternative in the ecosystem.

4 Likes

It depends on the design of the C library itself and how you decide to bind to it (and the extent of the API surface area you need to bind to). In general I think it’s better to do thin, mostly one-to-one even if unsafe, bindings and make them safe behind a module using OCaml code.

If that works for you then personally I think the bare OCaml FFI is fine. There are a few patterns that should be better described in the manual (mostly for custom values, finalized or not, to store C pointers and different strategies on how to deal with enums) but it ends up being rather straightforward.

Here is a recent example of mine, the surface API is not small but the C API is clean so the amount of glue is minimal and it’s mostly bare C calls with the appropriate conversions on arguments.

3 Likes

Wow, it is a huge binding! I’m impressed.

This pattern is repeated often(e.g. psa_export_public_key ):

 if (st == PSA_SUCCESS)
    return bytesrw_alloc_ok (Val_long (written));
  else
    return bytesrw_alloc_error (C_psa_status_t_to_val (st));

I would have rapidly got the urge to factorize it. A cool specification in the spirit of camlidl/swig would be:

[with written] psa_status_t  psa_export_public_key(
            [in] psa_key_id id,
            [in bigbyte_data] uint8_t * dd,
            [in bigbyte_len] int dd,
            [ignore] size_t* written)

Unfortunately, I think, camlidl is not extensible enough to have a conversion that use the result and written at the same time. It is perhaps possible with SWIG. With camlid it should be possible, but its a POC.

Do you think such declarative way would be simpler to write and maintain?

I used to use camlidl for a ton of stuff. For the reasons you mention. Then came a day when I wanted to wrapper a large C++ library (Rocksdb). And it had a C API (wrappered around the C++), so I started doing it, and …. wow, that was a lot of work. B/c you had to write so much memory-management code. And the API was complicated, so …. a lot of work.

Now, Rocksdb (descended from Leveldb) has one thing going for it: it adheres to the Google C++ Style Guide calling conventions. So memory ownership is predictable based on method type-signatures. And all memory-allocated objects have well-understood lifetimes (again b/c of adherence to the Style Guide). So it’s possible to generate the C stub from the C++ type signature, using template-based metaprogramming.

A long time ago, I did just that, in a little project called ā€˜cppffigen’. I used it to wrapper Rocksdb, and it was great. It’s been a long, long time since I hacked on that stuff. I should go update it. Not suggesting you should use any of it, but the the basic idea of using C++ templates as a basis for code-generation is a good one. I’ve noticed that Rust uses its traits/typeclasses in pretty much the same way, for the same purposes.

2 Likes

Maybe, maybe not :–) But I think I’m generally happier with low level technology. The problem with stub generators is that:

  1. You need to learn (and remember for maintenance) a new language.
  2. You need to make sure the binding patterns you are going to need are supported by the language (e.g. you seem to suggest that one I took in that binding wouldn’t be).
  3. The day there is a problem you need to (re)understand the language and the resulting code generation.
  4. It’s yet another tool you need to make sure gets maintained over the years.

For this reason I’m personally more enclined to low level WYSIWYG technology for FFI bindings (also the approach I took for JavaScript in brr).

While you may pay a higher initial binding cost (though a bit of M-x query-replace-regexp semi automation helps with transforming certain API patterns) I’d say that in the long term it’s easier for maintenance, auditing and spotting problems. It’s also more friendly for drive-by contributors and/or reviewer – if they know the FFI of the language of course.

But then I don’t say that’s always the approach that should be taken. For example for tgls I generate the (ctypes-based) bindings because there are XML files that describe the API.

P.S. The thing you mention you would have abstracted away. I remember in the past metaprogramming bindings with horrendous C macros to be the most DRY as possible. But then faced with a segfault it would become hard to assess the correctness of this dryness. Especially when I’d get the problem a couple of months later and have forgotten about my nice macro-level atrocities :–)

2 Likes

By the way except for ownerships and callbacks from C this is by far the most annoying thing when binding to C.

For example in tsdl (ctypes-based) I have a C program that generates OCaml code with the constants. Another approach is to use a plain variant and have a mapping table on the C side (example in Unix) with the disavantage that you need to keep that in sync across with the OCaml source. Both approaches are fine if you don’t need the invert map, if you need to pattern match the enum on OCaml (versus just use them to specify stuff to the C API) it becomes extra annoying.

I remember that at some point I was actually pre-processing OCaml sources with cpp and the library C includes in order to get those constants. Quite ugly but it worked I’m no longer sure why I stopped doing that :–)

I am using a similar approach: using a C program to export constants that are coming from the C side:

Before I was making the constants available from C to OCaml:

1 Like

Thank you for your comments, it helped me in the development of camlid. An example of the pattern we talk about can be found there.

For enum I have not yet added the helper, but the correspondance C/OCaml would be handled in C with a simple generated switch. For flags, it is also not yet implemented but an unboxed, untagged and noalloc external function lab1:bool -> lab2:bool -> lab3:bool -> lab4:bool -> Int32.t wrapped into a function ?lab1:bool -> ?lab2:bool -> ?lab3:bool -> ?lab4:bool -> unit -> Int32.t seems good enough.

1 Like

Have there been experiments that use libclang to generate OCaml bindings, like bindgen does for Rust? This could remove the need for a custom DSL.

Skimming, it looks like this is mostly for C++ ? (also C, but mostly aimed at C++ ?) If you assume well-behaved C++ (e.g. following the Google C++ style guide), it’s much, much simpler to generate bindings; even then, sometimes you have to step in and give the FFI generator some help.

I’ve been convinced for a while (OK, for 10yr) that the right way to do this is to generate bindings against C++ that adheres to that style guide, and for C, hand-write C++ wrappers to that standard first.

Also, if someone passing by would like to write a gobject-introspection binding generator, that would be extremely cool and useful to leverage libraries from the GObject/Gnome ecosystem (such as gstreamer) in OCaml.

GObject introspection is a middleware layer between C libraries (using GObject) and language bindings. The C library can be scanned at compile time and generate metadata files, in addition to the actual native C library. Then language bindings can read this metadata and automatically provide bindings to call into the C library.

There was GitHub - cedlemo/OCaml-GObject-Introspection: OCaml bindings to GObject-Introspection based on OCaml-Ctypes. but it needs some love.

I’m doing this, but it’s some way away from being ready (hence no announcement from me).

For clarity: this is not a direct port of lablgtk3, but heavily inspired by its architecture and internal design, but using GIR directly. It’s not a full time project and still need significant uplift to get it to a usable and complete state. I appreciate there may be some interest in this but I am not confident in its viability yet.

1 Like

It aims at C, but it is true the ā€œmethodā€ that a data-structure can provide are reminiscent of C++ objects. In a way it automatically does what you propose, by automatically generating the boilerplate from a ā€œC++ styleā€.

One of the reasons I think that C++ is a good target for FFI bindings, is that C++ templates allow for automatic type-based generation of conversion code. You can see that used to excellent effect in Rust FFI bindings also.

1 Like

SWIG is powerful but it’s somewhat brittle and the generated code is absolutely huge with tons of dynamism and conversions. You can look at some generated code and you should immediately be able to decide if it’s the path you want.

I think it’s less brittle than it used to be but I think it’s mostly due to a lot of whack-a-mole. Issues also still appear and last year it was failing for mcrypto (python), due to the inclusion of a new system header by openssl that had been forgotten before. That header defines _GNU_SOURCE or similar, which in turn includes additional headers, exposes more types and fields, and suddenly swig was trying to bind every system header.

1 Like