LLVM: symbol not found for bytecode compilation

Sure. Please feel free.

Before you go on reporting upstream. Isn’t that just a linking bug ? I.e. a missing dependency in the .cma or in the dll stub ?

That may be the case. I don’t know at this moment.

FYI, the llvm OCaml binding source (a.k.a how to make the .cma and .dll) is under llvm-truck, not the opam.
e.g. AddOCaml, OCaml/binding.

The command to build-ocaml-bind-from-llvm-source is in opam, here

It’s possible to fix it by patching on opam like the existing way for the build process, even if it cannot be solved by manipulating the build command.

It takes time to figure out the root cause and then decide where to put the fix.

Small update:

$ opam switch 4.11.2 # eval .. & opam install llvm
$ ocamlfind ocamlc -package llvm -linkpkg -o a a.m
$ ./a
; ModuleID = 'AAAAAAAA!'
source_filename = "AAAAAAAA!"

So, most probably, the problem is not on the LLVM side.

The code works in the utop (with #require "llvm";;) and ocaml toplevel.

# #require "llvm";;
/home/exx/.opam/4.12.0.many/lib/llvm: added to search path
/home/exx/.opam/4.12.0.many/lib/llvm/shared/llvm.cma: loaded
# open Llvm

let ctx = global_co    ntext ()
let mdl = cre  ate_module ctx "AAAAAAAA!"
let () = dump_module mdl;;

; ModuleID = 'AAAAAAAA!'
source_filename = "AAAAAAAA!"

The loader works.

So it works with ocamlopt for versions 4.11.2, 4.12.1, 4.13.1 and woks with ocamc for version 4.11.2, but not the 12 and 13.

It also works from an interactive session in 4.12 and 4.13. Interesting. And it looks less and less like an LLVM problem.

Since it runs in an interactive session you can also add a

#use "topfind";;
#requre "llvm";;

at the start of the file and then it will run just fine

I’ve run ocamlobjinfo $OPAM_SWITCH_PREFIX/lib/llvm/shared/llvm.cma and found the not found symbol. Relevant snipped below

	llvm_set_module_identifier
	llvm_get_module_identifier
	LLVMGetModuleContext
	llvm_set_module_inline_asm
	llvm_string_of_llmodule

As you see one of the symbols does not look like the others. LLVMGetModuleContext the one which was not found and it is the last one of such shape. So that is a fair chance that it will be the first to be checked.

Can be related to this change introduced in 4.12.0 - ocaml/Changes at trunk · ocaml/ocaml · GitHub

- #9551: ocamlc no longer loads DLLs at link time to check that
  external functions referenced from OCaml code are defined.
  Instead, .so/.dll files are parsed directly by pure OCaml code.
  (Nicolás Ojeda Bär, review by Daniel Bünzli, Gabriel Scherer,
   Anil Madhavapeddy, and Xavier Leroy)
2 Likes

Somehow trumping my own words about opam not exploding in my hands I don’t manage to install the llvm opam package on osx to repro this.

But could you verify every thing looks correct on the cma with:

> $OPAM_SWITCH_PREFIX/lib/llvm/shared/llvm.cma | grep 'Extra'

That it mentions in Extra dynamically-loaded libraries the stubs dll library and check via ldd that the stubs library itself, likely in $(opam var lib)/stublibs, has the dependency on the llvm library that actually defines that symbol.

1 Like

Ah but wait it seems the stubs have direct dependencies on the llvm library symbols so llvm.cma likely also need direct dependencies from the cma to the llvm library, in Extra dynamically-loaded libraries, not only on the stub library.

I don’t have these things in my head but it could indeed be a regression from the change you mentioned.

IIRC before that, at primitive check time, the ‘Extra dynamically-loaded libraries’ libraries would be dynlinked in the compiler for checking, this would of course also dynlink the stubs dependencies, so if you had deps on the stubs libs you could use their symbols as externals directly and it would find them while not being in ‘Extra dynamically-loaded libraries’ proper.

After #9551 this is no longer the case.

2 Likes

I also happen to think that the problem is on this line in llvm.ml

external module_context : llmodule -> llcontext = "LLVMGetModuleContext"

I guess, the reason this function is written the way it is, is because it accepts an opaque pointer and returns an opaque pointer.

If this is the case, it should be possible to write a smaller test case than the LLVM one.

I’ve run ocamlobjinfo on both static and and shared versions of llvm.cma it. It is mostly the same, not quite

shared:

Force custom: no
Extra C object files: -lllvm -lstdc++ -lLLVM-13 -lrt -ldl -lm -lz -ltinfo
Extra C options: -L$CAMLORIGIN/../.. -Wl,-rpath,$CAMLORIGIN/../.. -L/usr/lib64

static:

Force custom: YES
Extra C object files: -lllvm -lstdc++ -lLLVMSupport -lLLVMCore -lLLVMRemarks -lLLVMBitstreamReader -lLLVMBinaryFormat -lLLVMSupport -lLLVMDemangle -lrt -ldl -lm -lz -ltinfo

The reason the static version works is, probably, Force custom: YES which links the object code in the cma.

Cool!

From your discussion, I get the idea of how to try and experiment on the potential fix.

From #9551, I also got the hint why one previous z3 fix works locally but not on the CI may be related to the version of OCaml.

ocamlfind ocamlc -verbose -package llvm -linkpkg -dllib /usr/lib/llvm-13/lib/libLLVM -o a2.bc a.ml
./a2.bc

This works on my ubuntu.

The fix I would make is to put this dllib into the META, then the ocamlfind could set it for us.

It works also on Fedora with the following change -dllib /usr/lib64/libLLVM.so
I will be OK with that fix, but it also should probably go upstream?

PS. Out of curiosity I’ve done a

$ cd $OPAM_SWITCH_PREFIX/lib/llvm/
$ ls -l {shared,static}/llvm.cma 
-rw-r--r--. 1 rv rv 43950 Oct 29 09:15 shared/llvm.cma
-rw-r--r--. 1 rv rv 44055 Oct 29 09:14 static/llvm.cma

I was expecting a bigger difference in size.

Oh, I didn’t make it clear. The META file is in the llvm-truck so I will make a PR there (and an opam package with this fix before the next llvm release if necessary).

For the size, I guess the difference is in between how the libllvm.so and llvm.a is used, e.g. the final building executables using .so or .a. The stub may not differ too much.

The field I was interested in to check my theory is in fact Extra dynamically-loaded libraries. The libraries mentioned in this field are 1) dynamically loaded when the byte code is run 2) At byte code link time, primitives found in the byte code are checked for existence in these libraries so that you don’t get obvious missing symbol surprises when you run the executable.

In any case I think the problem is pinned down.

There’s more than one fix but solving that in the META is not such a good idea. You can either:

  1. Add the library to the Extra dynamically-loaded libraries field of the library by specifying -dllib -lLLVM when creating llvm.cma
  2. Stop using naked pointers (this means you won’t refer to symbols from the llvm library directly and the problem goes away).

Even though it will be more work I highly suggest 2. Naked pointers are deprecated and on their way out because of multicore. See this thread for background information and this section of the manual on how to go about this.

I see.

I appreciate the two ways to fix you give. I agree that choice 1 is much better than changing META and is easier to implement currently than 2.

As for 2, it’s a thorough fix despite the workload. Besides, I am unknown but more interested to know, if the change is large enough, would using ocaml-ctypes a better way to make a OCaml binding?

I had good experience using ocaml-ctypes and I would definitively use it if I had to deliver a large binding under time pressure. But on the other hand I adds a non-trivial layer that you could end up having to understand (there’s a paper about it here).

So personally I still often write my bindings with the bare OCaml FFI. Because I have a reasonable understanding of it and less dependencies means less bitrot in my projects. While it is certainly more error prone, it’s not so hard to use if you follow the rules and don’t try to be smart.

2 Likes

One practical consideration about how to implement bindings for llvm is that the upstream build bots, and llvm developers generally, do not have many ocaml-specific tools. I’m not sure, but it might be difficult to use an approach that needs additional tools or libraries to be installed. Another thing to consider is @kit-ty-kate’s llvm-dune project.

I have also been thinking about the question of where one language binding of a library should reside.

My observation on z3 and llvm is

  1. the source code of binding is in their source truck.
  2. META is in their source truck.
  3. OCaml binding is built with specific commands.
  4. The binding building requires OCAMLFIND to be installed.
  5. opam files are in OCaml’s opam-repository. So, it’s out of their business to publish them.

It seems to build the binding, ocamlfind is the minimal tool to have (of course with ocaml itself).

This is for the case when the binding is in-the-tree. There is also the case when it’s totally out-of-tree, e.g. clangml.

After all, who is concerning more whether OCaml users can use the library fluently? Is it the seller’s market or buyer’s market?

Personally, I think llvm-dune is definitely a better solution to distributed llvm in the OCaml world.

I agree that there are advantages to having the bindings out of tree. There are also advantages to having them in the tree though. I have seen more cases of a test failing when something is changed in the LLVM C API, leading the patch author to fix the ocaml bindings. I don’t know the optimal way, just noting that there are advantages both ways.

Currently, the blocker to having the opam file in the upstream trunk is the patches in the opam package that are applied to adjust the build system. If the differences between the upstream build scripts and the opam package could be eliminated, and if the opam file could be placed in the ocaml bindings directory (not at the root of the repo), then I do not think that there would be problems adding the opam file upstream.

Please feel free to add me as a reviewer to any changes sent upstream.