My fix of the linking problem on z3 OCaml binding

Problems on OCaml z3 binding

It’s frustrating that there have been subtle problems to use OCaml z3 binding: libc++abi.dylib terminating with uncaught exception (spoiler: I don’t solve this but bypass it) dyld: Library not loaded: libz3.dylib (spoiler: I SOLVED this) etc. Many OCaml users including me suffer from this at least for more than two years after z3 4.8.1 (z3#2305). @ivg once fixed it with an excellent explanation at z3#4468 (and the related post here). However, libc++abi error comes back to me again (z3#5578) when I use it on macOS.

So, there are two problems now:

  1. The staticlib-by-default doesn’t work for macOS users. It’s set so after z3.4.8.1. Z3 uses the exception handle internally in some use-case, however, ABI mismatch breaks those code. That’s why some people are still using z3.4.8.1 on macOS.

  2. z3 binding with the shared lib doesn’t work perfectly since users still need to set LD_LIBRARY_PATH manually or even worse on macos.

For problem 1, with some discussion in z3#5578, I made two opam files for package z3(it uses the shared library by default) and z3-static that can be used for the next z3 release. Problem 1 can be solved by choosing the one you can use.

Problem 2 is also solved with z3#5616 and z3#5618. In my local testing, OCaml z3 binding works correctly under any combination of {ocamlopt, ocamlc, ocamlc-with-custom} * {z3 shared lib, z3 static lib} * {ubuntu, macOS}, without any change on LD_WHATSOEVER_LIBRARY_PATH. Bravo!

I would like to share my ideas (and sufferings) here and I am guessing it may help people facing similar problems.

A glimpse of z3ml’s building, installing, and using

source gist: OCaml z3 linking · GitHub (live editor: https://mermaid.live/ which can load gist)

In the building of z3’s ml binding (it occurs in <z3_source>/build), we see these generated rules for z3 ml stublibs.

ocamlmklib -o api/ml/z3ml -I api/ml -L. api/ml/z3native_stubs$(OBJ_EXT) api/ml/z3enums.cmo api/ml/z3native.cmo api/ml/z3.cmo -lz3 -lstdc++

ocamlmklib will generate libz3ml.a dllz3ml.so and z3ml.cma. These files together with META and others are installed later via ocamlfind install z3 <all_these_files> into <opam-sitelib>/stublibs and <opam-sitelib>/z3. The opam-sitelib is usually ~/.opam/<your_switch>/lib.

When the user requires the z3 library, its entries in META will expand to the files in the package automatically.

ocamlfind ocamlc -o z1.bc -verbose -thread -package z3 -linkpkg z1.ml
ocamlfind ocamlopt -o z1.exe -verbose -thread -package z3 -linkpkg z1.ml

It works nicely if the original z3 is a static archive library. In ocamlmklib, the flags -lz3 and -L. locates <z3_source>/build/libz3.a and makes large files libz3ml.a dllz3ml.so. There is no problem when using the static library. libz3ml.a and libz3.a is installed into <opam-sitelib>/z3 and -package z3 will expands to -I <opam-sitelib>/z3 for ocamlc and -L<opam-sitelib>/z3 for ocamlopt.

However, it does have problems if using z3 shared library a.k.a. using libz3.so (or libz3.dylib). Both linkers and loaders are fragile and error-prone.

Linker and loader

(I don’t look up a textbook. Correct me if I am wrong.)

During the compilation, the linker finds and stores the dependent libraries in the compiled object.

During the execution of native-code or the interpretation of bytecode, the loader finds and loads the dependent libraries.

Assuming the OCaml binding is built with z3 shared library, when the user requires the z3 library

ocamlfind ocamlc -o z2.bc -verbose -thread -package z3 -linkpkg z1.ml
./z2.bc

It can build but running ./z2.bc fails because ocamlrun can load dllz3ml.so but it cannot find libz3.so required by dllz3ml.so.

We can provide the search path a.k.a where to find libz3.so for the loader to ocamlmklib via -dllpath <path/to/libz3.so>. This path is stored in dllz3ml.so and later the loader can search libz3.so in that path.

ocamlmklib -o api/ml/z3ml -I api/ml -L. api/ml/z3native_stubs$(OBJ_EXT) api/ml/z3enums.cmo api/ml/z3native.cmo api/ml/z3.cmo  -lz3-static -lstdc++ -dllpath $(ocamlfind printconf path)/stublibs

If we are using this, the above ocamlc ... and ./z2.bc work. However, the following command complains in the building:

ocamlfind ocamlopt -o z2.exe -verbose -thread -package z3 -linkpkg z1.ml

ocamlopt (actually the cc) linker cannot find libz3.so required by dllz3ml.so.

We can also provide search path for the linker to ocamlmklib via -ccopt -L<path/to/libz3.so>. This path is stored in z3ml.cma and later the linker can search libz3.so in that path.

ocamlmklib -o api/ml/z3ml -I api/ml -L. api/ml/z3native_stubs$(OBJ_EXT) api/ml/z3enums.cmo api/ml/z3native.cmo api/ml/z3.cmo  -lz3-static -lstdc++ -ccopt -L$(ocamlfind printconf path)/stublibs -dllpath $(ocamlfind printconf path)/stublibs

What if we have have -ccopt -L<path/to/libz3.so> but no -dllpath <path/to/libz3.so>? ocamlc can build but the bytecode cannot run. ocamlopt can build but the nativecode cannot run.

With both -ccopt -L and -dllpath, everyone is satisfied … except for the mac users!

What’s wrong with macOS!?

Even with the previous fix, this error still occurs:

dyld: Library not loaded: libz3.dylib
  Referenced from: blabla
  Reason: image not found
make: *** [test] Abort trap: 6

Some workaround solutions include

  1. Modify LD_LIBRARY_PATH. However, it may not work now due to System Integrity Protection (SIP, wiki).
  2. Copy libz3.dylib to your working path.

Answer: the problem is inside libz3.dylib. Unlike the ELF and the loader convention used in Linux, Mach-O also stores the library id inside the library file under section LC_ID_DYLIB. What is not loaded is not the file libz3.dylib but the id libz3.dylib in the search path.

According to the apple document on dynamic libraries, using a filename without paths or qualifiers means it will be searched only in ~/lib, /usr/local/lib, /usr/lib and the working path.

The id libz3.dylib in the LC_ID_DYLIB section of file libz3.dylib will propagate to any other shared or executable objects who uses it, stored in the LC_LOAD_DYLIB section. Therefore, the stublib dllz3ml.so and executables made with ocamlopt will all have this.

The fix is to change library id libz3.dylib inside dllz3ml.so (via install_name_tool -change) and libz3.dylib (via install_name_tool -id) to the absolute installed path <path/to/libz3.dylib>. We can do a similar game by just fixing one and check how the compiler, linker, loader, and interpreter work. It is exactly what I was doing when trying fixing this problem.

Now everyone is satisfied … including the mac users!

In hindsight, brew install --build-from-source --keep-tmp --debug z3 have this output, showing the platform experts also solve the problem by changing dylib ID.

==> Changing dylib ID of /usr/local/Cellar/z3/4.8.12/lib/libz3.dylib
  from libz3.dylib
    to /usr/local/opt/z3/lib/libz3.dylib
==> Changing dylib ID of /usr/local/Cellar/z3/4.8.12/lib/python3.9/site-packages/z3/lib/libz3.dylib
  from libz3.dylib
    to /usr/local/opt/z3/lib/python3.9/site-packages/z3/lib/libz3.dylib

Discussion

There are several glitches in my debugging process, and I don’t even have an idea on which is the best solution.

  1. On macos, ocamlmklib generates lib<foo>.a and dll<foo>.so. Would dll<foo>.dylib be more consistent with the system convention?
  2. On macos, ocamlfind install <foo>.dylib will be in the <opam-sitelib>/foo rather than <opam-sitelib>/stublibs by default. Would going to <opam-sitelib>/stublibs better for *.dylib?
  3. Who is the best guardian to set the installed path of a library? ocamlmklib (via passing it gcc -dynamiclib -install_name), ocamlfind install, opam?
  4. ocamlmklib -dllpath <path/to/foo> may not work on macos. It can be checked by otool -l <shared_library> | grep LC_RPATH -A2. What’s more, if the library’s name is not like @rpath/foo.dylib, even setting rpath successful may be useless.
  5. I saw ocamlfind: [WARNING] You have installed DLLs but the directory <opam-sitelib>/stublibs is not mentioned in ld.conf several times. I am thinking why ocamlopt or ocamlc don’t take <opam-sitelib>/stublibs in the default search path for the linker and loader? Or shall opam switch be in charge of this?

Reference

https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/UsingDynamicLibraries.html

OCaml manuals on ocamlc, ocamlopt, ocamlrun, etc.
many github issues and discourse posts on z3

6 Likes

Wow… thank you for your very thorough work on this! :star_struck:

1 Like

Thanks for the excellent write-up! It is great that we can now run z3 bindings on macOS in the dynamic linking mode. Though I think it still should be possible to get away with the ABI problem in the static mode.

The links do not work, could you please update them? I presume that the problem arises from the ABI mismatch due to the use of the code units built by different compilers. We have to be sure that we build z3 with the same compiler that was used to build the dependencies (e.g., libc++). We can achieve it either by properly selecting the dependencies (e.g., we can select between libstdc++ and libc++) or by selecting a compiler that is building z3 (clang++ vs g++). It is a fundamental issue that is not actually specific to z3, but rather to C++. For example, in BAP we have the same problem with loading llvm and we solve it by ensuring that we use clang++ and libc++ on macOS. Now, when I review my own patch, I am wondering why have I hardcoded libstc++. It is definitely a shortcoming and in general, we should be careful in selecting the C++ library (again it should be the function of the compiler that is used to build z3 and the compiler itself should be selectable as well).

Besides, have you tried building z3 using clang++ and dynamically linking with libc++? I believe it should resolve the ABI mismatch problem.

Bravo indeed! But we have also to consider the dlopen loader, which commonly has a more strict behavior than the operating system dynamic loader. It doesn’t allow any unresolved symbols that is why we (BAP) prefer statically linking the dependencies into cmxs files, so that we can be sure that they are loaded in runtime without any problems.

P.S. to be honest I stopped using macOS for development as they make the programmers’ life harder with each new release. I am currently lagging five versions behind, sill on El Captain :frowning:

P.P.S and also there is Windows, and flexdll, which adds a couple more of dimensions to the problem.

The links do not work, could you please update them?

Fixed. They are just references to lines in opam files.

I agree that the ABI mismatch is fundamental to C++. I don’t do it in this fix though it seems the starting point why I decided to solve it. It’s fun to see in https://libcxx.llvm.org/ ABI compatibility with gcc’s libstdc++ is listed in Features and Goals, which is unclear to me if it’s done or not. (I love your table on this). However, I have totally no ideas how to debug this in detail.

Besides, have you tried building z3 using clang++ and dynamically linking with libc++ ?

Not yet but I will try it later. My concern is I may see some hardcoded libstdc++ inside the z3 building script. It may be the ultimate reason for some bugs.

why we (BAP) prefer statically linking the dependencies into cmxs files

In the next release, z3-static on opam will be the one built with static linking, just like the current z3 on opam.

But we have also to consider the dlopen loader, which commonly has a more strict behavior than the operating system dynamic loader.

It’s good to know this! I suddenly realize why sometimes (with incomplete z3) the ocamlopt works but not ocamlc.

It’s hard to drop mac yet. It’s still the mainstream in my small lab.

2 Likes

But we have also to consider the dlopen loader, which commonly has a more strict behavior than the operating system dynamic loader.

May I confirm that dlopen loader is also used inside the linker at some moment, or the linker used by ocaml also requires some symbols must be filled in the linking?

Then I can explain this error (and discussion z3#5622) on the PR which I would like to do some post-install testing on opam z3. This works on my local machine but fails on the CI.

ocamlfind ocamlopt -o  ml_example_shared -thread -package z3 -linkpkg  ../examples/ml/ml_example.ml
File "../examples/ml/ml_example.ml", line 1:
Error: Error on dynamically loaded library: /home/vsts/.opam/default/lib/stublibs/dllz3ml.so: /home/vsts/.opam/default/lib/stublibs/dllz3ml.so: undefined symbol: Z3_mk_u32string
make: *** [Makefile:5029: ml_example_shared.byte] Error 2

I could guess that the transient latest commit of z3 has missing parts for the ocaml binding, which breaks the linker used in ocamlopt. No runtime loader is involved at this moment.

No, it is not used. A normal operating system linker is used, you can use the -verboseoptionocamlopt` to see how it is invoked.

You can also use nm -C dllz3ml.so and look for all symbols that are undefined (marked with U). It is fine if the symbols are undefined in a library, as long as they are provided by some other library. By default, the operating system linker (ld) allows undefined symbols that are not used that what dlopen doesn’t allow.

The next step that I would make, is to go through the build artifacts of z3 and find where it is defined. It looks like a new symbol recently introduced so it could be that they forgot to re-export it, for example, i.e., with extern "C".

Thanks! I will check it later.