C Stub : Sometimes the Old Ways Are Best?

Ctypes has been my go-to for writing C stubs. However it requires a lot of work in OCaml (allocation, finalization) and the build is far from straightforward since the OCaml code need information that only the C compiler knows.

Camlidl has an approach that is a lot simpler and still customizable: generate from a signature of a C function, C and ML code. Moreover it supports custom_block. It just need to be a little more customizable for some common cases:

  • initialization: some out parameter need to be initialized by calling a library function before making the C call
  • context: libraries can require to pass a common datastructure to all functions, like finalization and initialization
  • reference counting

However I’m wondering if it is not pushing too much the design of Camlidl. What are you using for your binding? Is SWIG useful ? It seems to quickly let you do everything by hand (e.g. arrays with an unknown length).

I tried writing a POC for an OCaml lib that would generate C code, so that it is easy to customize and you get to write in a language with lsp support. But the description algebra is not satisfying yet (but Format with tag is fun) and I don’t want to add a spurious alternative in the ecosystem.

2 Likes

It depends on the design of the C library itself and how you decide to bind to it (and the extent of the API surface area you need to bind to). In general I think it’s better to do thin, mostly one-to-one even if unsafe, bindings and make them safe behind a module using OCaml code.

If that works for you then personally I think the bare OCaml FFI is fine. There are a few patterns that should be better described in the manual (mostly for custom values, finalized or not, to store C pointers and different strategies on how to deal with enums) but it ends up being rather straightforward.

Here is a recent example of mine, the surface API is not small but the C API is clean so the amount of glue is minimal and it’s mostly bare C calls with the appropriate conversions on arguments.

1 Like

Wow, it is a huge binding! I’m impressed.

This pattern is repeated often(e.g. psa_export_public_key ):

 if (st == PSA_SUCCESS)
    return bytesrw_alloc_ok (Val_long (written));
  else
    return bytesrw_alloc_error (C_psa_status_t_to_val (st));

I would have rapidly got the urge to factorize it. A cool specification in the spirit of camlidl/swig would be:

[with written] psa_status_t  psa_export_public_key(
            [in] psa_key_id id,
            [in bigbyte_data] uint8_t * dd,
            [in bigbyte_len] int dd,
            [ignore] size_t* written)

Unfortunately, I think, camlidl is not extensible enough to have a conversion that use the result and written at the same time. It is perhaps possible with SWIG. With camlid it should be possible, but its a POC.

Do you think such declarative way would be simpler to write and maintain?

I used to use camlidl for a ton of stuff. For the reasons you mention. Then came a day when I wanted to wrapper a large C++ library (Rocksdb). And it had a C API (wrappered around the C++), so I started doing it, and …. wow, that was a lot of work. B/c you had to write so much memory-management code. And the API was complicated, so …. a lot of work.

Now, Rocksdb (descended from Leveldb) has one thing going for it: it adheres to the Google C++ Style Guide calling conventions. So memory ownership is predictable based on method type-signatures. And all memory-allocated objects have well-understood lifetimes (again b/c of adherence to the Style Guide). So it’s possible to generate the C stub from the C++ type signature, using template-based metaprogramming.

A long time ago, I did just that, in a little project called ‘cppffigen’. I used it to wrapper Rocksdb, and it was great. It’s been a long, long time since I hacked on that stuff. I should go update it. Not suggesting you should use any of it, but the the basic idea of using C++ templates as a basis for code-generation is a good one. I’ve noticed that Rust uses its traits/typeclasses in pretty much the same way, for the same purposes.

2 Likes

Maybe, maybe not :–) But I think I’m generally happier with low level technology. The problem with stub generators is that:

  1. You need to learn (and remember for maintenance) a new language.
  2. You need to make sure the binding patterns you are going to need are supported by the language (e.g. you seem to suggest that one I took in that binding wouldn’t be).
  3. The day there is a problem you need to (re)understand the language and the resulting code generation.
  4. It’s yet another tool you need to make sure gets maintained over the years.

For this reason I’m personally more enclined to low level WYSIWYG technology for FFI bindings (also the approach I took for JavaScript in brr).

While you may pay a higher initial binding cost (though a bit of M-x query-replace-regexp semi automation helps with transforming certain API patterns) I’d say that in the long term it’s easier for maintenance, auditing and spotting problems. It’s also more friendly for drive-by contributors and/or reviewer – if they know the FFI of the language of course.

But then I don’t say that’s always the approach that should be taken. For example for tgls I generate the (ctypes-based) bindings because there are XML files that describe the API.

P.S. The thing you mention you would have abstracted away. I remember in the past metaprogramming bindings with horrendous C macros to be the most DRY as possible. But then faced with a segfault it would become hard to assess the correctness of this dryness. Especially when I’d get the problem a couple of months later and have forgotten about my nice macro-level atrocities :–)

2 Likes

By the way except for ownerships and callbacks from C this is by far the most annoying thing when binding to C.

For example in tsdl (ctypes-based) I have a C program that generates OCaml code with the constants. Another approach is to use a plain variant and have a mapping table on the C side (example in Unix) with the disavantage that you need to keep that in sync across with the OCaml source. Both approaches are fine if you don’t need the invert map, if you need to pattern match the enum on OCaml (versus just use them to specify stuff to the C API) it becomes extra annoying.

I remember that at some point I was actually pre-processing OCaml sources with cpp and the library C includes in order to get those constants. Quite ugly but it worked I’m no longer sure why I stopped doing that :–)

I am using a similar approach: using a C program to export constants that are coming from the C side:

Before I was making the constants available from C to OCaml:

1 Like