Finding a maintainable, sustainable build system for the LLVM bindings + Dune currently doesn't meet the package's specific needs

alan · September 22, 2023, 2:18pm

Hi, I saw recent discussion about Dune’s inclusion in the OCaml platform roadmap, the future of older build systems such as OCamlfind, and whether Dune should be the “blessed” build system. To preface, for most of my OCaml programming needs, Dune has been sufficient and easy to use. However, I’m concerned about the suitability of both OCamlfind and Dune for the LLVM OCaml bindings.

First of all, LLVM is a C++ project. The people developing LLVM are C++ programmers, most of whom don’t know OCaml. This obviously causes friction whenever people need to make changes that require them to touch the OCaml bindings (perhaps because they are changing the C bindings). Therefore, there has been talk on the LLVM forums of moving the OCaml bindings to the “peripheral tier” or out of tree, though I believe that if this happens, the OCaml bindings will eventually fall of sync with new development. Both the Go and Python bindings have been recently removed, leaving the OCaml bindings the only ones left in the repository. Therefore, I am nervous about making any drastic changes that may inconvenience the other LLVM developers or provide motivation for them to move the OCaml bindings out of tree.

As a C++ project, LLVM uses CMake. In the repository, the OCaml bindings use CMake via scripts that invoke OCamlfind. If CMake finds OCaml on the system, it will add the OCaml bindings as a build target and run the OCaml tests as part of the test suite. Therefore, the OCaml bindings are integrated into the rest of LLVM’s build system.

To my understanding, the CMake-OCamlfind build system has caused inconveniences when packaging the LLVM bindings for opam. Up until LLVM 15, the LLVM bindings had patches to their build files kept in opam-repository. The most significant patch modifies the META file to enable linking to either the static or shared LLVM libs, while an accompanying shell script detects if the user has the static and shared LLVM libs installed. This solution makes use of OCamlfind’s “predicate” feature: the predicate llvm.static activates linking to the static libs. However, Dune does not support OCamlfind predicates, causing issues when using LLVM from Dune.

@kit-ty-kate has worked on a repository, llvm-dune, that has the LLVM repository as a Git submodule and builds the LLVM OCaml bindings with Dune. She has recently transferred its ownership to me. llvm-dune has a shell script that checks for static and shared LLVM libs and dynamically generates dune files in the source tree. Instead of OCamlfind predicates, it makes use of Dune’s virtual library feature to allow users to select between the static and shared libs. Virtual libraries allow a choice between multiple implementations of a module. So, the script copies each LLVM module’s source code into static and shared directories together with an appropriate dune file with the appropriate linker flags, to create “static” and “shared” implementations.

I’m not sure if the intention was to eventually upstream a Dune-based build solution into LLVM, but my personal feeling is that using a shell script to generate Dune files, and then duplicating the source code in static and shared directories, is too hacky to go upstream. I tried to find a better solution with Dune, but I could not find one. Furthermore, I’m not sure how I would integrate Dune into LLVM’s CMake-based build system. So, my concerns with Dune are:

Dune does not support OCamlfind predicates.
I cannot find a non-hacky way for Dune to select between static and shared libraries. Virtual libraries involve duplicating source code. For this use case, the source code is exactly the same, but the linker flags are different.
I am unsure how to integrate Dune into a CMake-based project in a way that won’t inconvenience C++ programmers.

Dune developer @jeremiedimino admitted in 2019:

Improving interoperability between ocaml, ocamlfind and dune

Now, there are a few things we need to acknowledge. First of all, Dune is relatively young. It is only 3 years old and there were other tools and standards that existed long before it such as ocamlfind. Dune also has flaws. For instance, if your project doesn’t fit in the Dune language, then you just can’t do much. Your best bet is to come and discuss with us to see how we can move forward. While we have been thinking about this question for a while, and in particular how to provide users some sort of extensibility this is simply something we have not prioritized so far. If you are a seasoned programmer and like to have a lot of power over how your project is built, Dune can feel a bit invasive.

Finally, there are also cases where Dune projects need to be used as part of a bigger system, such as big bazel projects. In such cases, it can be valuable to have a few more tools such as ocamlfind to tie things together.

Ideally, I think that the LLVM OCaml bindings should neither need patches and scripts in opam-repository nor a separate repository with the LLVM repository as a Git submodule. However, I do not think that any OCaml build system is currently sufficient. I personally worry that favoring Dune over other build systems is a step backwards because it is too opinionated to integrate into a C++ project that does not revolve around OCaml and does not provide the features that the LLVM package needs.

I want to know what other people think and if a solution can be found.

dbuenzli · September 22, 2023, 2:53pm

Honestly if that is the truth, it rather looks easier to have the bindings as a separate OCaml project and let that project use the OCaml build system it wishes, dune or whatever the person who maintains them prefers.

Bindings are delicate things I wouldn’t trust someone who doesn’t know OCaml to just fix the bindings as they go when they break them.

P.S. ocamlfind is not a build system it’s a tool to lookup installed libraries and their dependencies to compile and link against them. It’s used for example in the toplevel to lookup and load libraries whether you use dune or not.

lindig · September 22, 2023, 3:55pm

I am not familiar with the details but would it be an option to build the OCaml part in LLVM simply using some scripts and give up all the smartness that a build system provides? Bindings rarely change their module or dependency structure and the OCaml compiler is so fast that full recompilation would not matter. The downside is that it puts a burden on the maintainer if dependencies change but it would minimise the dependency on OCaml build tools.

A similar friction exists in the Xen project where a part is written in OCaml and was compiled using an arcane labyrinth of Makefiles. It took very long to at least accept the dune files to be used when dune is available (which it is typically for OCaml developers but maybe not for maintainers of Linux distributions).

In general I would agree with @dbuenzli’s proposal to move the bindings into an OCaml project to be maintained by OCaml developers (who have the most incentive to keep it current).

jbeckford · September 22, 2023, 6:09pm

Aside: If LLVM is providing a test matrix where some builds can take several hours to complete, the last thing you want to do is try to replicate that in GitHub Actions.

But your main question was about Dune and CMake, and the answer to that is independent of where the LLVM bindings are built. Have you considered using (pardon the syntax problems … it has been a while since I’ve done it):

(context (default
 (name static)
 (env (_ 
  (link_flags (* output of `llvm_config --system-libs --link-static --libs` *) )
  (env-vars
   (LLVMCore_LIB /x/y/LLVMCore.a)
   (LLVMLinker_LIB /x/y/LLVMLinker.a))))))
(context (default
 (name shared)
 (env (_ 
  (link_flags (* output of `llvm_config --system-libs --link-shared --libs` *) )
  (env-vars
   (LLVMCore_LIB /x/y/LLVMCore.so)
   (LLVMLinker_LIB /x/y/LLVMLinker.so))))))

That can be generated with configure_file() in CMake and placed into a dune-workspace file. Then invoke Dune with dune build --workspace=${CMAKE_CURRENT_BINARY_DIR}/dune-workspace. I do something similar in https://diskuv.com/cmake/help/latest/.

Anyway, that should do away with the need for the virtual libraries. And I suspect that if you encounter problems with dune-workspace that it will be much easier for the Dune team to fix than implementing ocamlfind predicates. And because you have environment variables providing the dynamism, you’d should also be able to simplify: check in your dune files rather than have scripts to generate them.

One other recommendation: let CMake compile your .c files rather than (foreign_stubs (language c) (names ${cfile}) ...). Not only will you make your fellow LLVM maintainers happy, the compiler and compiler flags that come from ocamlc -config are very very unlikely to be what CMake uses. Try to use (foreign_archives ...) instead.

alan · September 22, 2023, 8:49pm

Thank you for the very helpful lead, @jbeckford. I’m trying this out and I haven’t gotten the bindings to build yet (I need to figure things out on the CMake side), but from the Dune side, it seems to work great so far.

How do Dune contexts work for installed packages? How would a client of the LLVM bindings select between the static and shared versions?

jbeckford · September 22, 2023, 9:27pm

Without knowing a bit more, I’d rely on opam rather than Dune to do the static/shared choice. opam, as a package manager, is more appropriate than a build tool to inspect system packages like LLVM. And because much of the opam functionality will end up in Dune, you aren’t entering a one-way door. So:

conf-llvm or something similar should expose an opam package variable that says whether there are static LLVM libraries, shared LLVM libraries, or both on the system. Said another way, conf-llvm.opam should generate a conf-llvm.config file in its build:[...] section.
You have two LLVM bindings packages: llvm-static and llvm-shared. To keep it simple, you can invoke dune build --workspace dune-workspace.static in llvm-static.opam’s build:[...] just like was suggested for your CMake scripts. Ditto for llvm-shared. This step requires some review and testing so it works with opam monorepo.
The llvm package uses a filtered package formula to select between llvm-static and llvm-shared. Something like the following in your llvm.opam:
```
 depends: [
   "conf-llvm"
   "llvm-static" { ?conf-llvm:has-static-llvm }
   "llvm-shared" { ?conf-llvm:has-shared-llvm }
 ]
```

And all the usual caveats that this is a rough sketch.

Edit 1: To avoid complexity, I’d initially stay with (foreign_stubs ...) since your dune file won’t need to change when testing on the LLVM test machines versus when installed via opam for regular ocaml users.

mobileink · September 23, 2023, 11:11am

Minimal bazelized version here. I did not add support for selecting static v. shared libs, that would be trivial so I just didn’t bother. I mainly just wanted to see how hard it would be to build the stuff with Bazel, and the answer is, not hard at all. OTOH I did not do any tests, so that’s a potential complicating factor.

Since I expect to have complete seamless Bazel-OPAM integration done pretty soon (Proof of Concept already implemented), I think this makes Bazel a realistic option for this issue.

Regarding the status of the bindings within the LLVM project (in-tree or not etc.) my vote would be to maintain the bindings as a completely separate, freestanding project. I think it would actually increase the chances of it flourishing, contrary to what many expect (that it would die or at least wither if not in the official LLVM source tree).

alan · September 23, 2023, 1:08pm

Last night I had a go at using Dune to build the bindings in-repo: https://github.com/alan-j-hu/llvm-project/blob/dune/llvm/bindings/ocaml/setup.sh. However, I got stuck. When I run dune build @all --release, the following error occurs:

File "llvm/dune", line 1, characters 0-135:
1 | (library
2 |  (name llvm)
3 |  (public_name llvm)
4 |  (foreign_stubs
5 |   (language c)
6 |   (names llvm_ocaml))
7 |  (c_library_flags %{env:LLVMCore_LIB=}))
(cd _build/static && /home/me/.opam/5.1.0/bin/ocamlmklib -g -o llvm/llvm_stubs llvm/llvm_ocaml.o -ldopt '-L/home/me/llvm-project/builddune/lib  -lstdc++ -fPIC -lLLVMCore -lLLVMRemarks -lLLVMBitstreamReader -lLLVMBinaryFormat -lLLVMTargetParser -lLLVMSupport -lLLVMDemangle')
/usr/bin/ld: llvm/llvm_ocaml.o: warning: relocation against `llvm_diagnostic_handler_trampoline' in read-only section `.text'
/usr/bin/ld: llvm/llvm_ocaml.o: relocation R_X86_64_PC32 against symbol `llvm_diagnostic_handler_trampoline' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

I do not know why this happens. The function llvm_diagnostic_handler_trampoline in llvm_ocaml.c seems to be compiled in a way that doesn’t work with performance-independent code? This error does not happen when compiling with kit-ty-kate’s llvm-dune repo.

EDIT: I fixed the error with the following change:

diff --git a/llvm/bindings/ocaml/llvm/llvm_ocaml.c b/llvm/bindings/ocaml/llvm/llvm_ocaml.c
index f0e47a31af03..b83db58d9d3d 100644
--- a/llvm/bindings/ocaml/llvm/llvm_ocaml.c
+++ b/llvm/bindings/ocaml/llvm/llvm_ocaml.c
@@ -179,7 +179,7 @@ static value alloc_variant(int tag, value Value) {
 
 /*===-- Context error handling --------------------------------------------===*/
 
-void llvm_diagnostic_handler_trampoline(LLVMDiagnosticInfoRef DI,
+static void llvm_diagnostic_handler_trampoline(LLVMDiagnosticInfoRef DI,
                                         void *DiagnosticContext) {
   caml_callback(*((value *)DiagnosticContext), to_val(DI));
 }

I wonder why this fixes the issue, and if it should be upstreamed to LLVM? Is not using static a mistake?

alan · September 23, 2023, 2:41pm

I’ve successfully built the LLVM OCaml bindings in-tree using Dune and a shell script. _build/static and _build/shared both contain their own llvm.install files, but the llvm.install file that gets generated in the source directory always points to the static build artifacts.

jjb · September 23, 2023, 2:57pm

From the perspective of upstream integration, I wonder what the output of dune rules -m looks like and if it would enable building without having dune installed.

jbeckford · September 23, 2023, 3:24pm

(cd _build/static && ... -o llvm/llvm_stubs llvm/llvm_ocaml.o -ldopt '-L/home/me/llvm-project/builddune/lib -lstdc++ -fPIC ...'

Notice the single quote? You packed all of your flags into %{env:LLVMCore_LIB=}, but %{env:NAME=DEFAULT} is for single values. Kinda surprised it compiled at all. Just stick with putting your C flags and linker flags in the dune-workspace, or use a Dune special variable that expands to multiple values (ex. %{read-lines:...}). In both cases, the Dune manual is your friend.

alan · September 23, 2023, 3:30pm

How are LLVM pull requests tested? I know that when LLVM used Phabricator, a green check or red X was displayed for each patch. Now that LLVM has migrated to GitHub PRs, what is the automated testing story? Do the OCaml tests get run, and would requiring Dune to be installed cause any inconvenience?

If the bindings were to migrate to Dune upstream, do you think that the output of dune rules -m should be committed to the repository so that build machines (whether buildbots or developers’ personal computers) that don’t have Dune can build the bindings?

I think that if you’re doing OCaml work, you probably have Dune installed. Do you know the workflow of LLVM developers who modify the C bindings, and therefore occasionally need to touch the OCaml bindings, but don’t know OCaml? Do they have OCaml installed and build the OCaml bindings, or do they make changes to the OCaml bindings without building and testing them and just cross their fingers?

I think the main benefit of keeping the LLVM OCaml bindings in the repo is to put pressure on people who update the C bindings to also update the OCaml bindings. This pressure would require automated testing, with notifications if the OCaml bindings have build errors or test errors.

Before the Go and Python bindings were removed, their tests were seemingly disabled or not ran, allowing them to fall into disrepair: [RFC] Remove the Go bindings - LLVM Project - LLVM Discussion Forums, ⚙ D150642 [bindings] Remove LLVM python bindings

alan · September 23, 2023, 3:36pm

I was trying to follow your suggested format of

(context (default
 (name static)
 (env (_ 
  (link_flags (* output of `llvm_config --system-libs --link-static --libs` *) )
  (env-vars
   (LLVMCore_LIB /x/y/LLVMCore.a)
   (LLVMLinker_LIB /x/y/LLVMLinker.a))))))
(context (default
 (name shared)
 (env (_ 
  (link_flags (* output of `llvm_config --system-libs --link-shared --libs` *) )
  (env-vars
   (LLVMCore_LIB /x/y/LLVMCore.so)
   (LLVMLinker_LIB /x/y/LLVMLinker.so))))))

My shell script has something like

    core_libs=$(llvm_config $linking_mode --libs core support)
    analysis_libs=$(llvm_config $linking_mode --libs analysis)
    bitreader_libs=$(llvm_config $linking_mode --libs bitreader)
    bitwriter_libs=$(llvm_config $linking_mode --libs bitwriter)
    executionengine_libs=$(llvm_config $linking_mode --libs executionengine mcjit native)
    irreader_libs=$(llvm_config $linking_mode --libs irreader)
    transformutils_libs=$(llvm_config $linking_mode --libs transformutils)
    passes_libs=$(llvm_config $linking_mode --libs passes)
    target_libs=$(llvm_config $linking_mode --libs target)
    linker_libs=$(llvm_config $linking_mode --libs linker)
    all_backend_libs=$(llvm_config $linking_mode --libs $llvm_targets)

    echo "(context (default
 (name ${context_name})
 (env
  (_
   (c_flags $base_cflags)
   (env-vars
    (LLVMCore_LIB \"$ldflags $core_libs\")
    (LLVMAnalysis_LIB \"$ldflags $analysis_libs\")
    (LLVMBitReader_LIB \"$ldflags $bitreader_libs\")
    (LLVMBitWriter_LIB \"$ldflags $bitwriter_libs\")
    (LLVMExecutionEngine_LIB \"$ldflags $executionengine_libs\")
    (LLVMIRReader_LIB \"$ldflags $irreader_libs\")
    (LLVMTransformUtils_LIB \"$ldflags $transformutils_libs\")
    (LLVMPasses_LIB \"$ldflags $passes_libs\")
    (LLVMTarget_LIB \"$ldflags $target_libs\")
    (LLVMLinker_LIB \"$ldflags $linker_libs\")
    (LLVMAll_backends_LIB \"$ldflags $all_backend_libs\"))))))
" >> "dune-workspace"

Then, the dune file of a module looks something like:

(library
 (name llvm)
 (public_name llvm)
 (foreign_stubs
  (language c)
  (names llvm_ocaml))
 (c_library_flags %{env:LLVMCore_LIB=}))

Each module has different linking requirements, and hardcoding the paths is not good because the files can be installed in many different places based on how you built and installed LLVM.

Is this the wrong approach?

EDIT: Dune seems to recommend using (:include c_library_flags.sexp). However, the problem is that I don’t know how to switch them between contexts.

EDIT 2: I guess I can use the contexts to set an environment variable for the --link-static or --link-shared flag. However, generating a .sexp file for each module is going to be annoying…

jbeckford · September 23, 2023, 3:51pm

You deviated in two major ways from what I suggested:

You are not using (link_flags ...) for linker flags.
I had a single value (LLVMCore_LIB /x/y/LLVMCore.a). You have multiple values packed into (LLVMCore_LIB ...)

Which is why I suggested looking at the Dune manual.

Also: I don’t think discuss is the right venue to do a back-and-forth code review (and I wouldn’t sign up for that anyway).

alan · September 23, 2023, 4:26pm

Regardless, thank you for the tip and catch. I appreciate the help that you’ve given me.

I’ve rewritten the build files to use (:include c_library_flags.sexp) like the Dune docs recommend. There is more build machinery involved, but the issue with the single quotes is no longer present.

alan · September 24, 2023, 3:25am

I managed to get the bindings to build with Dune, but I can’t get the tests to execute successfully. When I try to run the tests, I keep getting the error:

Fatal error: cannot load shared library dllllvm_stubs
Reason: /<path>/llvm-project/<build_dir>/lib/ocaml/stublibs/dllllvm_stubs.so: undefined symbol: _ZNKSt3_V214error_category10_M_messageB5cxx11Ei

The CMake build system does something with rpath: https://github.com/llvm/llvm-project/blob/186a4b3b657878ae2aea23caf684b6e103901162/llvm/cmake/modules/AddOCaml.cmake#L152.

  if( APPLE )
    set(ocaml_rpath "@executable_path/../../../lib${LLVM_LIBDIR_SUFFIX}")
  elseif( UNIX )
    set(ocaml_rpath "\\$ORIGIN/../../../lib${LLVM_LIBDIR_SUFFIX}")
  endif()
  list(APPEND ocaml_flags "-ldopt" "-Wl,-rpath,${ocaml_rpath}")

I’m not sure if passing this flag would solve the issue, and if so, how I would best accomplish it using Dune. I’m guessing that as the OCaml library build artifacts are stored in <build_dir>/lib/ocaml/llvm for (the upstream, unforked) LLVM, the rpath tells them to look three directories down, then back up into lib? I am by no means an expert on these topics, however.

I tried putting -ldopt -Wl,-rpath,-Wl,-rpath,$ORIGIN/../../../lib in c_library_flags.sexp, but got the error /usr/bin/ld: cannot find -ldopt: No such file or directory. There is an open issue with a similar problem: Linker flags vs. libraries · Issue #2977 · ocaml/dune · GitHub

Edit:

The following change on the upstream LLVM main branch does not break the tests:

diff --git a/llvm/cmake/modules/AddOCaml.cmake b/llvm/cmake/modules/AddOCaml.cmake
index 891c9e6d618c..a9835518e43e 100644
--- a/llvm/cmake/modules/AddOCaml.cmake
+++ b/llvm/cmake/modules/AddOCaml.cmake
@@ -151,7 +151,7 @@ function(add_ocaml_library name)
   elseif( UNIX )
     set(ocaml_rpath "\\$ORIGIN/../../../lib${LLVM_LIBDIR_SUFFIX}")
   endif()
-  list(APPEND ocaml_flags "-ldopt" "-Wl,-rpath,${ocaml_rpath}")
+  #list(APPEND ocaml_flags "-ldopt" "-Wl,-rpath,${ocaml_rpath}")
 
   add_custom_command(
     OUTPUT ${ocaml_outputs}

So my Dune experimentation must have other issues.

EDIT 2: So, the tests work for native, but not for bytecode. Interesting…

--- a/llvm/test/Bindings/OCaml/analysis.ml
+++ b/llvm/test/Bindings/OCaml/analysis.ml
@@ -1,6 +1,4 @@
 (* RUN: rm -rf %t && mkdir -p %t && cp %s %t/analysis.ml
- * RUN: %ocamlc -g -w +A -package llvm.analysis -linkpkg %t/analysis.ml -o %t/executable
- * RUN: %t/executable
  * RUN: %ocamlopt -g -w +A -package llvm.analysis -linkpkg %t/analysis.ml -o %t/executable
  * RUN: %t/executable
  * XFAIL: vg_leak
diff --git a/llvm/test/Bindings/OCaml/bitreader.ml b/llvm/test/Bindings/OCaml/bitreader.ml
index 2638ca9d8c76..6b05e3d64b2f 100644
--- a/llvm/test/Bindings/OCaml/bitreader.ml
+++ b/llvm/test/Bindings/OCaml/bitreader.ml
@@ -1,6 +1,4 @@
 (* RUN: rm -rf %t && mkdir -p %t && cp %s %t/bitreader.ml
- * RUN: %ocamlc -g -w +A -package llvm.bitreader -package llvm.bitwriter -linkpkg %t/bitreader.ml -o %t/executable
- * RUN: %t/executable %t/bitcode.bc
  * RUN: %ocamlopt -g -w +A -package llvm.bitreader -package llvm.bitwriter -linkpkg %t/bitreader.ml -o %t/executable
  * RUN: %t/executable %t/bitcode.bc
  * RUN: llvm-dis < %t/bitcode.bc

alan · September 24, 2023, 5:00pm

I’ve figured out the issue. I needed to add the -custom flag when running ocamlc to statically link the C code into the bytecode executable, as I don’t have the CMake variable BUILD_SHARED_LIBS set. The corresponding line in the CMake file is https://github.com/llvm/llvm-project/blob/0d821b22e01b8205b680ca302caeb6516611ecf8/llvm/cmake/modules/AddOCaml.cmake#L66.

alan · September 24, 2023, 11:53pm

I can now successfully build the LLVM OCaml bindings with Dune in the LLVM source tree by calling the Dune build system from CMake and run the tests successfully. I have a PR at [OCaml] Build OCaml bindings using Dune by alan-j-hu · Pull Request #67272 · llvm/llvm-project · GitHub.

There is a shell script for a configure step in which the user can pass necessary variables and settings. I believe that a pre-build script is necessary because Dune can’t handle the level of dynamic behavior, nor do I expect Dune too. (Build systems with the desired level of dynamism tend to become entire, and often ugly, scripting languages.) Having a similar ./configure.sh step prior to make is very common for C libraries.

I expect that both LLVM’s CMake build system and opam should be able to call the configure script to set the appropriate parameters, and then the LLVM OCaml bindings should be able to be distributed on opam-repository without needing patch files or a separate repo.

I don’t think that keeping the LLVM OCaml bindings in-tree guarantees that they will stay up to date, but I think that keeping them in-tree helps. Evidently, the Python and Go bindings were kept in-tree but nevertheless became unmaintained, so they were moved out of tree as a result. However, even when nobody was really maintaining the OCaml bindings, people making changes to the C API would also make the corresponding changes to the OCaml bindings. I think that integrating the OCaml bindings with the rest of LLVM’s build system and tests helped create pressure to do this.

I am reluctant to move the LLVM OCaml bindings out of tree because I think it’s important for the OCaml bindings to sync with the C API and having them in-tree creates the pressure to do this. I want the OCaml bindings to be maintained in the main branch of LLVM instead of playing catch-up each time a new version of LLVM is released. For comparison, the Haskell LLVM bindings have recent commits yet seem to have fallen behind: GitHub - llvm-hs/llvm-hs: Haskell bindings for LLVM (the default branch is for LLVM 12, and there is a huge skip to LLVM 15). The SML LLVM bindings were last updated in 2014: GitHub - melsman/sml-llvm: Standard ML Bindings for LLVM

The LLVM OCaml bindings also make assumptions that LLVM pointers are at least two-bit aligned (the lsb is 0) to represent LLVM pointers as integers instead of allocating an OCaml block. I feel comfortable doing this as long as the bindings are in-tree and therefore an “official” part of the LLVM project. However, I worry that if the bindings were made “unofficial,” making assumptions about LLVM implementation details may not be as safe and alignment guarantees could change without notice.

As for contributing to LLVM, I think that when LLVM used Phabricator and Arcanist, the workflow was harder, dissuading people from contributing, but now that LLVM uses GitHub PRs, more people should be familiar with the contribution process. Even though not everyone on the LLVM project agrees with the move, I hope that GitHub PRs will get rid of the barrier for contributing due to how ubiquitous GitHub is today.

mobileink · September 25, 2023, 1:29am

I can build it with Bazel but some of the tests fail. Neither the OPAM package nor PR you mentioned provide any info about tests. I’ve extracted the testsuite from llvm/test/Bindings/OCaml for several different versions of the LLVM toolchain without success. I would be grateful if you could provide the test sources, so I can determine if there’s a problem with my build.

FYI I’m running OCaml 5.1 (and 4.14.0) on MacOS Ventura 13.5.2, M2 max (which llvm reports as aarch64).

Thanks,
Gregg

alan · September 25, 2023, 2:41am

Sure. I’m not an expert on the LLVM test harness, but a big part of it seems to be https://github.com/llvm/llvm-project/blob/main/llvm/test/lit.cfg.py and https://github.com/llvm/llvm-project/blob/main/llvm/test/lit.site.cfg.py.in. The first file has some parts that set what arguments to pass to ocamlfind and sets the relevant OCaml environment variables, and the second file is merely a configuration file that points to ocamlfind, says whether ocamlopt is installed, and has OCaml compiler flags.

These two lines from lit.cfg.py say that the OCaml libraries are built in <llvm lib dir>/ocaml and <llvm lib dir>/ocaml/llvm, and when I build the OCaml bindings in-tree, this location of the build directory is indeed where the META files and the .a and .cm[a,i,t,ti,x,xa] files are saved.

If you notice, the OCaml tests have comments that tell the LLVM test harness what commands to run and what IR output to expect: https://github.com/llvm/llvm-project/blob/main/llvm/test/Bindings/OCaml/core.ml. If you remove the OCaml tests from the LLVM test harness and execute them as a standalone, the IR output won’t be checked because the expected IR is written as special comments understood by LLVM’s test harness. All the LLVM tests look like this. I think that the OCaml tests need to be integrated in LLVM’s test system to ensure the OCaml bindings are kept up to date.

I’m not in favor of removing the OCaml bindings from the tree because I think that without the LLVM test harness, a lot of important checks in the OCaml tests will be lost. I also want the OCaml tests to be run whenever the C API gets changed to prevent bitrot.

Topic		Replies	Views
Using output of `ocamlfind <params> -only-show` in a dune build Learning dune	8	949	October 13, 2020
Configuring a project directory and setting up the related toolchain (opam, dune, etc.) Ecosystem opam , core , base , dune	13	5500	October 1, 2018
Improving interoperability between ocaml, ocamlfind and dune Ecosystem dune , ocamlfind	16	2839	October 8, 2019
Use C binding with PPX and dune Learning ppx , dune , bindings	12	1977	July 9, 2021
OCaml RFC#17: library linking proposal Ecosystem cross-compilation	26	2792	June 16, 2021

Finding a maintainable, sustainable build system for the LLVM bindings + Dune currently doesn't meet the package's specific needs

Related topics