[ANN] clangml 4.2.0: OCaml bindings for Clang API (for C and C++ parsing)

I also started experimenting a couple months ago on a tool/library that uses clangml to generate C bindings from C headers. (at that time this uncovered a couple bugs in clangml and @thierry-martinez was very quick in fixing them, so many thanks to him)

My use case was to generate bindings for a library with a large API surface (many functions, many public structures with many fields), that evolves over time, but which follows consistently a set of conventions through its API. Furthermore, the library manages memory in a way that is quite convenient for FFI bindings (no ownership transfer, the library manages its own memory and only produces handles (structure pointers) to memory it manages itself—the lifetime of data is less obvious but it’s less crucial for OCaml bindings).

Here’s a summary of some of the design constraints and observations that I can remember:

  • The generator must produce wrapper code that directly corresponds to an idiomatic OCaml API. The goal is to avoid manually going through the whole surface of the C API, so there is no point in e.g. generating a description of the library in some C-like OCaml eDSL (e.g. ctypes) if one then needs to manually write wrapper code on top of that.
  • Thanks to the fact that the C library implements some conventions and uniform memory discipline, it should be possible to write generator code that takes advantage of these conventions.
    For instance, the memory management discipline of the library meands that I can build in my generator that for any structure s_foo the library exposes, a value of type *s_foo can always be represented on the ocaml side as an integer value corresponding to the value of the pointer, and exposed as an abstract type s_foo (complemented with accessor functions for the fields of the structure)—without having to copy data around.
  • So ideally, there would be a generator library providing relatively generic components helping process and iterate over the C headers AST, that one would then use to implement a generator tool that bakes in assumptions and knowledge about the implicit conventions of a specific library.
  • Generating wrapper code for structure accessors or enum declarations is automatable in a relatively generic way. Nevertheless, some wrapper code (specific to the given library) will have to be written by hand. I wrote wrapper code by hand for the callback system of the library I was looking at and their implementation of linked lists, then made these manual bindings available to my automated generator.
  • Even writing all this library-conventions-specific code, I’m not sure that the bindings generation can be completely press-button. In particular, it seems hard to account for the common pattern of functions returning their results through arguments. If a library exposes a function void rotate(int x, int y, int z, int* ox, int* oy, int* oz), are ox, oy and oz arguments of the function, or used as return values? (in which case the wrapper code should generate an ocaml function returning a tuple). It’s quite obvious here that they are return values, but it seems hard to automate that without relying on brittle heuristics.
    Maybe that means that user must specify by hand the “polarity” of arguments for every function of the API that uses some arguments as return values?

To wrap up, I think it’s an interesting approach, certainly more viable than writing the bindings by hand (even with ctypes) as I started doing, and I’m very glad that clangml exists so that we can experiment with it.
What I found hard was to draw a line between components of the bindings generator library that should be reusable, and parts that were making assumptions about conventions followed by the specific C library I was looking at.

I will try to release the code and write a blog post about it at some point.

I was also aware of the bindgen tool for rust, but not python’s ctypeslib, so that’s something that I should indeed check out.

3 Likes