Cargo/Opam packaging of a Rust/OCaml project

It is currently unclear how to package and distribute an OCaml package that binds a piece of Rust code. Rust is a popular language, especially for cryptography, and it’s safer than C, making it an important technology for the OCaml ecosystem.

In this post we explore a few issues and possible solutions to integrate Cargo and Opam, hoping to start a discussion with the community. At the end we describe the concrete use case of Tezos that sparked this investigation at Nomadic Labs.

The main problem

Rust does not have a stable ABI and the approach taken is to always recompile from sources in a local environment (like a local _opam switch).
The official approach is to discourage the installation of compiled libraries and only install executables.

Opam on the other hand allows to install compiled libraries, so a library that binds a piece of Rust code, and links it, breaks the invariant of Rust.
In particular there is the risk of linking to several versions of the same Rust dependency because the constraints are solved separately.

A possible solution for Opam to respect this invariant could be that at each installation of an executable package containing Rust code, all installed libraries containing Rust code should be recompiled.

A somewhat similar solution is employed by Debian: “So, we can’t reasonably ship compiled versions of Rust libraries. Instead, library packages ship source code, and application packages build all the library crates they use from source.”

How to bind Rust

The approach used to bind an existing Rust crate foo is to:

  • write a pure Rust crate foo-ffi which exposes selected functions of foo in a C compatible way, using Rust standard FFI.
  • write a binding ocaml-foo which binds foo-ffi, either with hand-written stubs or c-types.

Cargo and Opam

Cargo is roughly the equivalent of dune + opam, it takes care of compilation using the compiler rustc, dependencies and packaging.
Cargo.toml is equivalent to an .opam file and among other things it declares dependencies.
Cargo.lock contains the result of solving the dependencies constraints declared in the Cargo.toml files. It contains the exact version of each crate that will be downloaded and compiled. Lock files should be checked in the sources of binaries to have reproducible builds, but they should not be included for libraries.

Finding dependencies

Online

Normally cargo build would download all required dependencies, which is forbidden by Opam’s sandbox.

We could make an exception in Opam’s sandbox for cargo build to contact exclusively the official repository crates .io.

Alternatively we could add a new backend for depext to run cargo build, much like an external package manager.

A related discussion can be found in issue 3460.

Offline

Dependencies can also be vendorized using cargo vendor or can be downloaded to a local registry like done by Debian. The two solution are almost equivalent.

An example of vendoring can be found in batsat-ocaml.

Finding a compiled library to link to

Find the “local switch”

Every run of cargo build is done in a local switch (in opam terms) which is by default located in the directory target/release next to the Cargo.toml that defines the build.

This directory can be found by dune/opam using cargo metadata and can be customized (e.g. to be inside an opam switch).

Installed in the system

An FFI library is stable with respect to C’s ABI, so it could be installed in the system. There is no cargo support for this though.

Sharing dependencies in a workspace

When building multiple crates for a project, it is important to solve all dependencies at the same time to share a maximum of code and avoid linking two versions of the same library.

This can be achieved by defining a cargo workspace that declares the crates as members and that will result in a single build directory for all of them, with dependencies shared.

Note that this solution works at the root of a project but can’t be nested in subprojects as nested workspaces are not supported by cargo.

Tezos case

There are currently two Rust libraries that Nomadic Labs wrote OCaml bindings for, librustzcash and bls12-381. These libraries share a large number of dependencies.

The solution currently used to build Tezos from source is to declare a Cargo workspace at the root of the project with the two vendored Rust libraries as members. This ensures that dependencies are solved at the same time and the result can be inspected in the resulting Cargo.lock which is committed to git.

Inside both OCaml bindings Dune uses cargo metadata to find the workspace root and the compiled .a libraries to link to.

The question now is how to obtain a similar result when installing Tezos from an Opam package. Hence the above discussion.

5 Likes

Here is the example of such project https://github.com/zshipko/ocaml-rust-starter

Hello!

I’d be interested in discussing this, and seeing how we could implement better support in opam :slight_smile:

Let’s use an example where you have opam package pack, which depends on foocaml and barcaml, each of them depending on rust libraries resp. foorust and barrust, with intersecting dependencies.

The compilation constraints in Rust and OCaml are basically the same IIUC, so while Cargo choses the more basic approach to recompile everything at app generation time, the way opam recompiles every dependent package at the slightest change of a library should be applicable as well with Rust.

The trouble comes from the fact that, well, we have to choose one or the other, and they don’t agree. In particular, both OCaml and Rust need dependency resolution, and opam does that once before anything else, while Rust would to it when called for building (or vendoring), e.g. at pack compilation time.

So there could be a few approaches there:

  1. full opam way: technically, Rust packages could probably be encoded as opam packages, and opam resolution would select their versions (avoiding linking of the same library, or different versions thereof, multiple times). Then handling separate compilation of Rust dependencies seems highly theoretical, though.
  2. the Cargo way: let opam do the resolution (for opam packages anyway), but then, the compilation of Rust or any packages depending on Rust would be delayed, just aggregating their dependencies until an application package is reached. At that point, the build of pack would create the Cargo workspace and build everything.
  3. the way between: would it be possible to compile the OCaml bindings without having actually compiled the Rust libraries yet ? In this case, we could stay on the opam way for OCaml code, and delay the compilation of the actual Rust libraries to linking time. We would probably need to resolve the Rust libraries versions beforehand, though, so that would mean some encoding of them in the opam repository. So that would go: ① let opam resolve the package versions (Rust through conf-rust-* packages, and OCaml) ② the conf-rust-* packages just gather information on what will be needed and don’t compile anything (note: they should also pre-download to cache…) ③ the OCaml bindings are compiled in advance (?) ④ before linking time, in any opam package, create a Rust workspace, gathering the info from all conf-rust-* package in the dependency tree of the opam package.

I am not sure it’s the most satisfying (but you can’t get the best version resolution unless you have all the dependency info in one solver and in one pass), but 2. seems to be the most effective solution without too much effort at the moment; in fact, if we assume that your OCaml packages use dune to build, that should help a lot with following the Cargo process. I will attempt to sketch something below.

I am not yet completely familiar with the Rust/Cargo ecosystem, so please correct me if I made any wrong assumptions here!


Rough sketch of how we could go (assuming all OCaml packages dependent on Rust use dune to build!)

  • For packages foocaml and barcaml:
    • instead of building, just copy the source tree to e.g. <prefix>/rust-bindings/foocaml
  • For package pack:
    • copy <prefix>/rust-bindings/XXX to the build directory, for any XXX appearing in your dependencies (you could use the opam variables <pkg>:depends)
    • gather in the same way the Rust dependencies, and create the corresponding Cargo workspace. Resolution of Rust dependencies happens at this point, a bit late but well.
    • dune build should be able to put everything together

Remains the issue of the downloads, that opam doesn’t allow during builds. This is easy to relax in the sandboxing script (on Linux, remove the --unshare-net from ~/.opam/opam-init/hooks/sandbox.sh) ; otherwise, we need conf-rust-* packages that can prefetch the Rust sources to be later used by Cargo, although that doesn’t seem easy to achieve (we could leverage the extra-sources field, but need to know the URLs in advance).

conf-rust-* packages sound interesting, but I worry about having to duplicate many packages from crates.io on opam and ensuring they’re up to date.

Currently it seems like the cleanest way to build Rust code with dune/opam is with a cargo workspace and vendored dependencies. I don’t have a huge issue with this method, although there is definitely room for improvement.

One of the most tedious parts of writing a dune file that calls cargo is locating the resulting Rust libraries - I’m not sure exactly what it would look like, but if there was a way to create better Cargo/dune integration, that would be extremely helpful in standardizing this kind of build process.

1 Like

We discussed solutions 1 and 2 here at Nomadic and, personally, I’m in favor of solution 2 the “cargo way”. It seems tricky and a lot of work to encode one package manager into another (with conf- packages or other ways) and I don’t see the advantage.
The reason for this post however was precisely to see if I’m missing some important use case for which the “cargo way” doesn’t work.

Solution 3 is interesting as well. I’m not sure why you need to resolve Rust dependencies beforehand thought. Can you elaborate?
Also you won’t be able to run tests for these libraries, is that a problem?

About the sandboxing, that depends on the security model Opam wants to guarantee. If cargo.io is considered a reputable source then Opam could allow cargo do contact exclusively that domain (not sure it’s doable with the current sandbox script).

From looking at the various prototypes we’ve built around this (thanks @zshipko!), I’d suggest this:

  • get Cargo and Dune build integration working. This is clearly a prerequisite for anything, since it requires opinionated build systems on either side (that is, Cargo builds on rustc, and dune builds on ocamlopt). With this integration, we will have a source tree arrangement that ensures that Cargo and Dune understand where each other’s build artefacts are.

  • get opam to understand how to deal with source code arrangements for monorepos. This actually simplifies opam’s role: it has to arrange source code in the right place, and then execute a single build command. This is not only useful for Rust/OCaml, but also for larger OCaml monorepo projects as well which could execute a single dune build rather than 50+ as happens at the moment. There are a number of options for how to integrate cargo here to fetch its distfiles, but cargo vendor would work today to start with.

To get us started, would you be so kind as to open up a Dune/Cargo feature request on https://github.com/ocaml/dune/issues, @paracetamolo? We can do some investigation there and understand what’s needed at the next Dune meeting. Once that’s done, we can figure out the opam end.

1 Like

Another way would be to add an option in the opam file. We can have something like external-build-systems: [ "rust" {">=1.36.0"} ] at the top level of the opam file. For each build system, you define a structure to define the dependencies. For Rust, it might be the workspace. In addition to that, the corresponding compiler version can be downloaded and added in $OPAM_SWITCH/bin (let’s not forget that multiple versions of a build system can be used for different projects, and it should not depend on a global installed version).
When downloading the opam dependencies, the external build system dependencies are also downloaded at the same time. For Cargo, you can use cargo fetch (see https://doc.rust-lang.org/cargo/commands/cargo-fetch.html). Of course, it is supposed external build systems allow this feature (it would be a condition sine qua non to be supported by opam). It should be possible to enforce only a list of URL’s like the cargo repository or some project hosting services like GitHub/GitLab. It does also require to trick the Cargo config to only check the sources in the opam switch! $CARGO_HOME can be used (see https://doc.rust-lang.org/cargo/guide/cargo-home.html). For info, the Rust sources will be downloaded in different directory depending on the dependency type (from a registry, from git, path, etc).
On the building side, as the dependencies are now on the user machine, in a specific directory, with cargo set up correctly, with the correct version, in the opam switch, the user can trick the build section in the opam file. It does also allow the user to use the external build systems in other sections like run-test!
It does look like option 2 with some additional ideas.

Solution 3 is interesting as well. I’m not sure why you need to resolve Rust dependencies beforehand thought. Can you elaborate?

It’s just that otherwise, we would resolve opam dependencies first — ignoring any constraints specific to Rust — then install all opam dependencies, and only then, ask Rust to resolve the gathered dependencies, which might lead in a conflict. E.g. imagine that foorust and barrust were incompatible, or depended on incompatible versions of another library, because opam didn’t choose compatible versions of the bindings.

Sorry for bumping this old thread, but it seems the solution is still to be found for this problem.

I’ve achieved some success binding asynchronous Rust to asynchronous OCaml (Lwt flavour, but that’s technicality), and have it open-sourced (see ocaml-lwt-interop project).

But it’s very challenging right now to use this in your own libraries. Currently established workflow of statically linking everything Rust-related into large .a that is linked to OCaml binary in the end does not scale beyond proprietary codebase in a monorepo.

There is a discussion on this matter in this ocaml-rs issue. Turns out that Rust is trying to formalize some official approach to separate compilation with external linking (see rust-lung#73632), but it seems that it won’t help us much as the best strategy is to pull all Rust sources for each binary into a single staticlib which will be linked into that binary.

Ideally some tool should scan all the dependencies of an OCaml binary, and for each OCaml library which embeds a Rust crate as foreign stubs, that tool should compose a top level crate for that binary, which should look something like this:

src/lib.rs:

pub use ocaml_lib1;
pub use ocaml_lib2;
pub use ocaml_lwt_interop;

Cargo.toml:

[package]
name = "ocaml-rust-deps"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["staticlib", "cdylib","rlib"]
path = "src/lib.rs"

[dependencies]
ocaml-lib1 = { path = "_opam/lib/rust-lib1" }
ocaml-lib2 = { path = "_opam/lib/rust-lib2" }
ocaml-lwt-interop = { path = "some-local-dir/ocaml-lwt-interop" }

and dune:

(rule
 (targets libocaml_rust_deps.a dllocaml_rust_deps.so)
 (deps
  (glob_files_rec ../*.rs)
  (glob_files_rec ../*.toml)
  ../Cargo.lock
  ../.cargo/config
  (source_tree ../vendor))
 (action
  (no-infer
   (progn
    (chdir
     %{workspace_root}
     (run
      cargo
      build
      --target-dir
      %{workspace_root}/../../_build_rust
      --release
      --offline
      --package
      ocaml-rust-deps))
    (run
     mv
     %{workspace_root}/../../_build_rust/release/libocaml_rust_deps.a
     libocaml_rust_deps.a)
    (run
     mv
     %{workspace_root}/../../_build_rust/release/libocaml_rust_deps.so
     dllocaml_rust_deps.so)))))

(library
 (name rust_deps_lib)
 (foreign_archives ocaml_rust_deps)
 (c_library_flags
  (-lpthread -lc -lm)))

(Ugly hack to move rust build dir into the source directory is to speed up the build, dune will remove it on any changes if it’s inside the build dir it seems).

Resulting rust_deps_lib just needs to be included into the binary. This approach works and is future-proof it seems.

Open question is how such tool can be build and what kind of tool it needs to be. Will dune be able to read some library metadata out of already built library somewhere inside an opam switch, to figure out if it contains Rust crate that it needs to potentially take care of? Should that metadata go to .opam files, and opam could read it and build a list of rust dependencies for provided .opam file? Does opam support some arbitrary metadata in opam files? If yes, probably some external tool can be prototyped for this, it should read all local .opam files, build depedency tree, scan it for that specific metadata, and generate a dune library like I outlined above for each .opam package. Does that sound sane? :thinking:

1 Like