Yesterday I was mentioned in an LLVM RFC on disabling and removing OCaml binding from the source. I feel it’s better to forward the message here to a broader audience.
Whether an OCaml binding should be on-the-tree or off-the-tree was discussed in my previous post on z3. However, I believe this time it’s not due to a (theoretical) topic on which should be better, but it may not work and will be moved. It’s more worrisome during the OCaml 5 era because of some breaking changes in C API.
Before tackling this problem, the first thinking in my mind is OCaml platform makes sure some common libraries e.g. opam, dune, ppxlib, etc will work. The second thinking is there do have a health check for opam packages (http://check.ocamllabs.io).
Maybe we can extend the care to OCaml bindings in these libraries. I can think of some problems here:
The breakings occur at their truck branch. It’s much earlier before the binding update in opam. The idea to solve this may be to build up some CI to monitor that. I am feeling most libraries won’t change a lot on how to build and install it. Therefore one CI can last for enough time.
Subtle platform-related building/linking problems. My experience may limit to {z3, llvm, and ocaml-torch} * {ubuntu, wsl, macos} but I believe I had encountered enough similar problems. It’s better that the care cover these common platforms.
We may need an OCaml binding maintenance Guidelines that residents together with the OCaml document/tutorial on C API. It can be helpful to other our community or other community.
(Please allow me to make my un-humble 5 cents) I am confident to solve these binding problems on the technical side, however, I need mentoring on how OCaml platforms and ecosystem works.
Thanks for the note @arbipher. @tmattio has been thinking about the larger platform roadmap and may be interested to chime in.
and from the original thread
OCaml is used to be steady on the C API and FFI. OCaml 5 makes unavoidable breaking changes to support algebraic effects and multicore. I also wonder if there is better practice to making/maintaining the binding and let me check for that.
This is not correct. Sequential programs running on OCaml 4 can be moved to OCaml 5 without any breakages. This was a very explicit design choice. The only issues will be unrelated, deprecated functions and features finally removed as part of the major version bump. If you have concrete examples of breakages, please let me know and I am happy to have a look.
Pure OCaml 4 programs still work with OCaml 5, but the FFI has a breaking change: naked pointers are no longer supported. The LLVM bindings use naked pointers, so I have been authoring a patch to replace the use of naked pointers so that the bindings work on OCaml 5. My patch currently passes the tests, so I feel optimistic, but it’s a lot of code to review.
My team at Tarides provides the infrastructure and CI systems for check.ocamllabs.io, opam.ci.ocaml.org and ci.ocamllab.io (ocaml-ci). ocaml-ci can build a project hosted on GitHub or GitLab that uses a standard opam / dune build setup. I suspect the llvm bindings might not be so standard .
What does the llvm ocaml build setup look like? Could you provide a pointer to the source, I am not familiar with the project. In general we can build any Dockerfile on Linux and some restricted subset on macos using ocluster. It might be possible to hack a custom pipeline if I can understand the build setup and express it as a Dockerfile.
Here is my summary for building LLVM itself and its OCaml binding.
(basic)
LLVM(llvm/llvm-project) is an umbrella monorepo repository for the sub-projects llvm (core structures and LLVM IR), clang (compiler) lld (linker), etc.
The majority of LLVM is written in C++ and it also provides an C library that wraps the C++ libraries.
The sub-project llvm is the core structures providing common structures and LLVM IRs that other sub-projects can use.
OCaml llvm binding on opam is built upon the llvm’s C library.
(On building LLVM from source)
Building LLVM takes two steps after cloning llvm/llvm-project.
4.1. Generate the project to build in a build system via cmake -G ninja <lots of parameters>.
4.2. Build it. Which sub-projects to build depends on the default settings and the parameters provided.
Currently, building OCaml binding is turned on by default (LLVM_ENABLE_BINDINGS is On).
OCaml binding is one of the official bindings, so in (4.1) if the condition check passes, the OCaml binding will be built in (4.2):
# Here is the log excerpt for condition check in (4.1)
# i.e. whether you have installed the correct OCaml toolchains and libraries
## case disabled
-- Found OCaml: /Users/ex/.opam/ocaml5/bin/ocamlfind
-- OCaml bindings disabled, need ctypes >=0.4.
## case enabled
-- Found OCaml: /Users/ex/.opam/ocaml5/bin/ocamlfind
-- OCaml bindings enabled.
The LLVM RFC is on change building the OCaml binding from default On and default Off and move it to peripheral-tier. It doesn’t not directly impact the OCaml binding but it’s a bad signal.
(On building LLVM binding from opam)
The opam package llvm is an incremental building upon step 4. It depends a virtual package conf-llvm that requires a system-level installation of common LLVM libraries. This system-level installation performs step (4.1) (4.2) and may usually not OCaml related. Then opam package llvm will clone and generate the building project for OCaml binding, with the system-level LLVM libraries to avoid building the OCaml binding from scratch.
The source code for LLVM binding is on-the-tree of LLVM, so the opam packages for llvm binding contain just opam file and a few patches. The patches are for making both static-linked libraries and dynamic-linked libraries.
That is very much appreciated.
However at the point of writing a binding you don’t yet know how it is going to be used. To be fully general a library may want to aim to support multicore, which means being careful about the things mentioned in the manual (the OCaml runtime lock now no longer protects global C data structures, so bindings should not use function-local static variables, or C globals, etc.), however that can be done as a 2nd step after fixing naked pointers.
I should point out that fixing the use of naked pointers needs to be done very carefully, I actually managed to introduce a race condition (in the sequential OCaml 4 mode too!) while doing that in Xen: each use of Abstract and Custom values now needs to be carefully inspected so that the dereference doesn’t happen with the runtime lock released. Previously that would’ve been fine since it was just pointer arithmetic, but now it is an actual dereference of an OCaml value that the GC may have moved.
Abstract tags are even more subtle: you cannot store the C pointer obtained from an abstract tag in a local variable, because the underlying OCaml value may move at any time (when the runtime lock is released), and that C pointer is just an offset into the Abstract tag, so it’ll end up pointing to a stale location.
Here are some concrete examples of the kind of bugs to avoid while removing naked pointers: [7/7] tools/ocaml/xc: Don't reference Custom objects with the GC lock released - Patchwork[6/7] tools/ocaml/xc: Don't reference Abstract_Tag objects with the GC lock released - Patchwork
As another concrete example of what might happen to OCaml bindings if they’re not continuously tested/paid attention to: if someone refactors some C function and introduces a new parameter, and they think they try to be helpful and update the OCaml binding to pass the new parameter, but forget to update the .ml file! [5/7] tools/ocaml/xc: Fix binding for xc_domain_assign_device() - Patchwork
This one can actually be detected quite easily at compile time (no need for fancy static analyzers or a CI), here is my attempt on how to do it for Xen (it requires a new enough OCaml that has compiler-libs): tools/ocaml: generate a .h file to check arity of OCaml C stubs · edwintorok/xen@29dea80 · GitHub. The approach could be adapted to work with LLVM, to at least catch bytecode function arity bugs at build time (though perhaps using migrate-parsetree would be a better idea than compiler-libs directly).
Native function arity is a bit more complicated to compute (with unboxed annotations/etc.) so I’ve only done it in the static analyzer for now, but if it is useful I could look into making a small arity checker that does both.
Not directly on topic, but I wonder if it is such a bad thing to move the LLVM bindings out of tree. I remember trying to fix a bug or two in the bindings in the past and I always found setting up the LLVM build quite cumbersome. Perhaps if the bindings lived in their own repository as a standard OCaml library, it would make it easier to contribute as well.