Cargo/Opam packaging of a Rust/OCaml project

It is currently unclear how to package and distribute an OCaml package that binds a piece of Rust code. Rust is a popular language, especially for cryptography, and it’s safer than C, making it an important technology for the OCaml ecosystem.

In this post we explore a few issues and possible solutions to integrate Cargo and Opam, hoping to start a discussion with the community. At the end we describe the concrete use case of Tezos that sparked this investigation at Nomadic Labs.

The main problem

Rust does not have a stable ABI and the approach taken is to always recompile from sources in a local environment (like a local _opam switch).
The official approach is to discourage the installation of compiled libraries and only install executables.

Opam on the other hand allows to install compiled libraries, so a library that binds a piece of Rust code, and links it, breaks the invariant of Rust.
In particular there is the risk of linking to several versions of the same Rust dependency because the constraints are solved separately.

A possible solution for Opam to respect this invariant could be that at each installation of an executable package containing Rust code, all installed libraries containing Rust code should be recompiled.

A somewhat similar solution is employed by Debian: “So, we can’t reasonably ship compiled versions of Rust libraries. Instead, library packages ship source code, and application packages build all the library crates they use from source.”

How to bind Rust

The approach used to bind an existing Rust crate foo is to:

  • write a pure Rust crate foo-ffi which exposes selected functions of foo in a C compatible way, using Rust standard FFI.
  • write a binding ocaml-foo which binds foo-ffi, either with hand-written stubs or c-types.

Cargo and Opam

Cargo is roughly the equivalent of dune + opam, it takes care of compilation using the compiler rustc, dependencies and packaging.
Cargo.toml is equivalent to an .opam file and among other things it declares dependencies.
Cargo.lock contains the result of solving the dependencies constraints declared in the Cargo.toml files. It contains the exact version of each crate that will be downloaded and compiled. Lock files should be checked in the sources of binaries to have reproducible builds, but they should not be included for libraries.

Finding dependencies

Online

Normally cargo build would download all required dependencies, which is forbidden by Opam’s sandbox.

We could make an exception in Opam’s sandbox for cargo build to contact exclusively the official repository crates .io.

Alternatively we could add a new backend for depext to run cargo build, much like an external package manager.

A related discussion can be found in issue 3460.

Offline

Dependencies can also be vendorized using cargo vendor or can be downloaded to a local registry like done by Debian. The two solution are almost equivalent.

An example of vendoring can be found in batsat-ocaml.

Finding a compiled library to link to

Find the “local switch”

Every run of cargo build is done in a local switch (in opam terms) which is by default located in the directory target/release next to the Cargo.toml that defines the build.

This directory can be found by dune/opam using cargo metadata and can be customized (e.g. to be inside an opam switch).

Installed in the system

An FFI library is stable with respect to C’s ABI, so it could be installed in the system. There is no cargo support for this though.

Sharing dependencies in a workspace

When building multiple crates for a project, it is important to solve all dependencies at the same time to share a maximum of code and avoid linking two versions of the same library.

This can be achieved by defining a cargo workspace that declares the crates as members and that will result in a single build directory for all of them, with dependencies shared.

Note that this solution works at the root of a project but can’t be nested in subprojects as nested workspaces are not supported by cargo.

Tezos case

There are currently two Rust libraries that Nomadic Labs wrote OCaml bindings for, librustzcash and bls12-381. These libraries share a large number of dependencies.

The solution currently used to build Tezos from source is to declare a Cargo workspace at the root of the project with the two vendored Rust libraries as members. This ensures that dependencies are solved at the same time and the result can be inspected in the resulting Cargo.lock which is committed to git.

Inside both OCaml bindings Dune uses cargo metadata to find the workspace root and the compiled .a libraries to link to.

The question now is how to obtain a similar result when installing Tezos from an Opam package. Hence the above discussion.

6 Likes

Here is the example of such project https://github.com/zshipko/ocaml-rust-starter

Hello!

I’d be interested in discussing this, and seeing how we could implement better support in opam :slight_smile:

Let’s use an example where you have opam package pack, which depends on foocaml and barcaml, each of them depending on rust libraries resp. foorust and barrust, with intersecting dependencies.

The compilation constraints in Rust and OCaml are basically the same IIUC, so while Cargo choses the more basic approach to recompile everything at app generation time, the way opam recompiles every dependent package at the slightest change of a library should be applicable as well with Rust.

The trouble comes from the fact that, well, we have to choose one or the other, and they don’t agree. In particular, both OCaml and Rust need dependency resolution, and opam does that once before anything else, while Rust would to it when called for building (or vendoring), e.g. at pack compilation time.

So there could be a few approaches there:

  1. full opam way: technically, Rust packages could probably be encoded as opam packages, and opam resolution would select their versions (avoiding linking of the same library, or different versions thereof, multiple times). Then handling separate compilation of Rust dependencies seems highly theoretical, though.
  2. the Cargo way: let opam do the resolution (for opam packages anyway), but then, the compilation of Rust or any packages depending on Rust would be delayed, just aggregating their dependencies until an application package is reached. At that point, the build of pack would create the Cargo workspace and build everything.
  3. the way between: would it be possible to compile the OCaml bindings without having actually compiled the Rust libraries yet ? In this case, we could stay on the opam way for OCaml code, and delay the compilation of the actual Rust libraries to linking time. We would probably need to resolve the Rust libraries versions beforehand, though, so that would mean some encoding of them in the opam repository. So that would go: ① let opam resolve the package versions (Rust through conf-rust-* packages, and OCaml) ② the conf-rust-* packages just gather information on what will be needed and don’t compile anything (note: they should also pre-download to cache…) ③ the OCaml bindings are compiled in advance (?) ④ before linking time, in any opam package, create a Rust workspace, gathering the info from all conf-rust-* package in the dependency tree of the opam package.

I am not sure it’s the most satisfying (but you can’t get the best version resolution unless you have all the dependency info in one solver and in one pass), but 2. seems to be the most effective solution without too much effort at the moment; in fact, if we assume that your OCaml packages use dune to build, that should help a lot with following the Cargo process. I will attempt to sketch something below.

I am not yet completely familiar with the Rust/Cargo ecosystem, so please correct me if I made any wrong assumptions here!


Rough sketch of how we could go (assuming all OCaml packages dependent on Rust use dune to build!)

  • For packages foocaml and barcaml:
    • instead of building, just copy the source tree to e.g. <prefix>/rust-bindings/foocaml
  • For package pack:
    • copy <prefix>/rust-bindings/XXX to the build directory, for any XXX appearing in your dependencies (you could use the opam variables <pkg>:depends)
    • gather in the same way the Rust dependencies, and create the corresponding Cargo workspace. Resolution of Rust dependencies happens at this point, a bit late but well.
    • dune build should be able to put everything together

Remains the issue of the downloads, that opam doesn’t allow during builds. This is easy to relax in the sandboxing script (on Linux, remove the --unshare-net from ~/.opam/opam-init/hooks/sandbox.sh) ; otherwise, we need conf-rust-* packages that can prefetch the Rust sources to be later used by Cargo, although that doesn’t seem easy to achieve (we could leverage the extra-sources field, but need to know the URLs in advance).

conf-rust-* packages sound interesting, but I worry about having to duplicate many packages from crates.io on opam and ensuring they’re up to date.

Currently it seems like the cleanest way to build Rust code with dune/opam is with a cargo workspace and vendored dependencies. I don’t have a huge issue with this method, although there is definitely room for improvement.

One of the most tedious parts of writing a dune file that calls cargo is locating the resulting Rust libraries - I’m not sure exactly what it would look like, but if there was a way to create better Cargo/dune integration, that would be extremely helpful in standardizing this kind of build process.

1 Like

We discussed solutions 1 and 2 here at Nomadic and, personally, I’m in favor of solution 2 the “cargo way”. It seems tricky and a lot of work to encode one package manager into another (with conf- packages or other ways) and I don’t see the advantage.
The reason for this post however was precisely to see if I’m missing some important use case for which the “cargo way” doesn’t work.

Solution 3 is interesting as well. I’m not sure why you need to resolve Rust dependencies beforehand thought. Can you elaborate?
Also you won’t be able to run tests for these libraries, is that a problem?

About the sandboxing, that depends on the security model Opam wants to guarantee. If cargo.io is considered a reputable source then Opam could allow cargo do contact exclusively that domain (not sure it’s doable with the current sandbox script).

From looking at the various prototypes we’ve built around this (thanks @zshipko!), I’d suggest this:

  • get Cargo and Dune build integration working. This is clearly a prerequisite for anything, since it requires opinionated build systems on either side (that is, Cargo builds on rustc, and dune builds on ocamlopt). With this integration, we will have a source tree arrangement that ensures that Cargo and Dune understand where each other’s build artefacts are.

  • get opam to understand how to deal with source code arrangements for monorepos. This actually simplifies opam’s role: it has to arrange source code in the right place, and then execute a single build command. This is not only useful for Rust/OCaml, but also for larger OCaml monorepo projects as well which could execute a single dune build rather than 50+ as happens at the moment. There are a number of options for how to integrate cargo here to fetch its distfiles, but cargo vendor would work today to start with.

To get us started, would you be so kind as to open up a Dune/Cargo feature request on https://github.com/ocaml/dune/issues, @paracetamolo? We can do some investigation there and understand what’s needed at the next Dune meeting. Once that’s done, we can figure out the opam end.

1 Like

Another way would be to add an option in the opam file. We can have something like external-build-systems: [ "rust" {">=1.36.0"} ] at the top level of the opam file. For each build system, you define a structure to define the dependencies. For Rust, it might be the workspace. In addition to that, the corresponding compiler version can be downloaded and added in $OPAM_SWITCH/bin (let’s not forget that multiple versions of a build system can be used for different projects, and it should not depend on a global installed version).
When downloading the opam dependencies, the external build system dependencies are also downloaded at the same time. For Cargo, you can use cargo fetch (see https://doc.rust-lang.org/cargo/commands/cargo-fetch.html). Of course, it is supposed external build systems allow this feature (it would be a condition sine qua non to be supported by opam). It should be possible to enforce only a list of URL’s like the cargo repository or some project hosting services like GitHub/GitLab. It does also require to trick the Cargo config to only check the sources in the opam switch! $CARGO_HOME can be used (see https://doc.rust-lang.org/cargo/guide/cargo-home.html). For info, the Rust sources will be downloaded in different directory depending on the dependency type (from a registry, from git, path, etc).
On the building side, as the dependencies are now on the user machine, in a specific directory, with cargo set up correctly, with the correct version, in the opam switch, the user can trick the build section in the opam file. It does also allow the user to use the external build systems in other sections like run-test!
It does look like option 2 with some additional ideas.

Solution 3 is interesting as well. I’m not sure why you need to resolve Rust dependencies beforehand thought. Can you elaborate?

It’s just that otherwise, we would resolve opam dependencies first — ignoring any constraints specific to Rust — then install all opam dependencies, and only then, ask Rust to resolve the gathered dependencies, which might lead in a conflict. E.g. imagine that foorust and barrust were incompatible, or depended on incompatible versions of another library, because opam didn’t choose compatible versions of the bindings.

Sorry for bumping this old thread, but it seems the solution is still to be found for this problem.

I’ve achieved some success binding asynchronous Rust to asynchronous OCaml (Lwt flavour, but that’s technicality), and have it open-sourced (see ocaml-lwt-interop project).

But it’s very challenging right now to use this in your own libraries. Currently established workflow of statically linking everything Rust-related into large .a that is linked to OCaml binary in the end does not scale beyond proprietary codebase in a monorepo.

There is a discussion on this matter in this ocaml-rs issue. Turns out that Rust is trying to formalize some official approach to separate compilation with external linking (see rust-lung#73632), but it seems that it won’t help us much as the best strategy is to pull all Rust sources for each binary into a single staticlib which will be linked into that binary.

Ideally some tool should scan all the dependencies of an OCaml binary, and for each OCaml library which embeds a Rust crate as foreign stubs, that tool should compose a top level crate for that binary, which should look something like this:

src/lib.rs:

pub use ocaml_lib1;
pub use ocaml_lib2;
pub use ocaml_lwt_interop;

Cargo.toml:

[package]
name = "ocaml-rust-deps"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["staticlib", "cdylib","rlib"]
path = "src/lib.rs"

[dependencies]
ocaml-lib1 = { path = "_opam/lib/rust-lib1" }
ocaml-lib2 = { path = "_opam/lib/rust-lib2" }
ocaml-lwt-interop = { path = "some-local-dir/ocaml-lwt-interop" }

and dune:

(rule
 (targets libocaml_rust_deps.a dllocaml_rust_deps.so)
 (deps
  (glob_files_rec ../*.rs)
  (glob_files_rec ../*.toml)
  ../Cargo.lock
  ../.cargo/config
  (source_tree ../vendor))
 (action
  (no-infer
   (progn
    (chdir
     %{workspace_root}
     (run
      cargo
      build
      --target-dir
      %{workspace_root}/../../_build_rust
      --release
      --offline
      --package
      ocaml-rust-deps))
    (run
     mv
     %{workspace_root}/../../_build_rust/release/libocaml_rust_deps.a
     libocaml_rust_deps.a)
    (run
     mv
     %{workspace_root}/../../_build_rust/release/libocaml_rust_deps.so
     dllocaml_rust_deps.so)))))

(library
 (name rust_deps_lib)
 (foreign_archives ocaml_rust_deps)
 (c_library_flags
  (-lpthread -lc -lm)))

(Ugly hack to move rust build dir into the source directory is to speed up the build, dune will remove it on any changes if it’s inside the build dir it seems).

Resulting rust_deps_lib just needs to be included into the binary. This approach works and is future-proof it seems.

Open question is how such tool can be build and what kind of tool it needs to be. Will dune be able to read some library metadata out of already built library somewhere inside an opam switch, to figure out if it contains Rust crate that it needs to potentially take care of? Should that metadata go to .opam files, and opam could read it and build a list of rust dependencies for provided .opam file? Does opam support some arbitrary metadata in opam files? If yes, probably some external tool can be prototyped for this, it should read all local .opam files, build depedency tree, scan it for that specific metadata, and generate a dune library like I outlined above for each .opam package. Does that sound sane? :thinking:

1 Like

After a lot of trial and error internally at work, looks like I managed to build something that’s somewhat useable right now with stock Opam, Dune and Cargo, and scales to complex dependency hierarchies.

The idea is quite simple - whenever Opam package has Rust stubs, we mark it as such in its opam file using the x-rust-stubs-crate metadata field, that Opam otherwise ignores. Special tool, rust-staticlib-gen is using the Opam libraries to traverse the dependency graph of root package (originating from local project’s .opam file), collecting all Rust stubs crates required to fulfill the linking. This tool generates cargo crate with the so called Rust staticlib, which depends on all those crates indentified during the scan of Opam dependency tree, with exact version constraints to ensure compatibility. As the result, there is a dune library, which drags this Rust staticlib into the final executables. Such staticlibs are built in top level projects whenever executables are required to be linked. Or they might be built in the projects with bindings, as testing executables are still executables and require all external functions to be present to make linker happy.

Some attempt was made to hide scary linker errors whenever some crate is missing from the staticlib or staticlib was not provided at all during the linking phase (OCaml libraries that depend on Rust bits do not express this dependency to dune directly). Libs that require rust staticlib declare dependency on rust-staticlib virtual library, and generated staticlib claims to implement that virtual library. Expectation is that user hits an error that virtual library is missing implementation, googles up what’s that library and finds documentation on how to generate staticlib in their project to resolve this. Not super elegant, but looks like a way forward right now.

Probably this solution could be integrated further into dune? So that dune knows that certain libraries require Rust crates to be built and linked into final executables? I’m not sure how that gonna work right now.

So, this is the tooling that I built in attempt to solve this problem: GitHub - Lupus/rust-staticlib-gen: A tool to generate static library with all transitive Rust dependencies of an OCaml project
Any feedback would be much appreciated.

1 Like

I guess that’s not going to be the kind of feedback you are seeking for. But here it goes.

It looks like an impressive work of engineering (in the sense solving a problem under a given set of technical constraints) but from a design perspective (in the sense solving problems and caring about what you subject users to) I think it shows that rust doesn’t seem to care about being interoperable with other languages; which is not surprising since everything should be rewritten in rust. This looks absolutely horrendous – not what you did, what you had to do, assuming you did the best.

I’m not sure that trying to cope with that state of affairs (i.e. the engineering view point) is a good idea in the long term. OCaml and opam has a pretty good story for interoperating with C and I don’t think it should be more complicated than that. If rust is unable to provide that interoperability experience and does not care then I think that the answer should simply not to use it instead of trying to cope with the complexity their broken system enforces on you.

One good outcome of this is that perhaps people who still think that your build system and your package manager should be the same thing could finally take a cue that it’s not a very good idea. Works wonders for newcomers, casual users and closed minds, but is otherwise totally impractical for real work. Alas one thing I eventually learned is that, unexpectedly, programmers are not technically rational at all :–)

5 Likes

cargo is fine. source: millions of rust developers shipping to prod with it.

I don’t mean to stake a position on cargo or rust dev tooling here, but I would like to voice support for critical evaluation of technical solutions rather than appeals to popularity. IMO, it is well worth entertaining critical evaluations even (and especially) of popular solutions, this lets us learn from (and not reproduce) possible defects.

Consider how unhelpful “C is fine. source: millions of C developers shipping to prod with it.” is as a reply to people trying to discuss C’s imperfections.

Have you considered putting your metadata at the findlib level rather than the opam level?

I mean place something like the following in the root directory of every project that has Rust stubs (Dune syntax below):

# filename: META.<library name>.template
rust_stubs_crate: "..."

And then when you need to link an OCaml executable with a monolithic Rust static library, you do three things …

First, introduce a (library (name abc) ...) that has most of the dependencies of your OCaml executable; the real requirement is that dependencies need to transitively include the Rust dependencies.

Second, use a Dune rule that:

  1. Depends on %{project_root}/META.abc which will have the correct requires metadata field necessary for the next step.
  2. Calls ocamlfind query to get the transitive closure of the META.abc, including all the rust_stubs_crate fields in your transitive closure. (I’m glossing over that META.abc won’t be installed yet to the findlib installation, so you’ll need to use the findlib library to find the transitive closure).
  3. Run your rust-staticlib-gen (?) to create a Rust static library def from that transitive closure.

Third, create the (executable (libraries abc) (foreign_archives def) ...). Any OCaml external somefunc : unit -> unit = "rust_def_somefunc" expressions should now be linked correctly.

One (selfish) benefit of this approach is that I could use the new META fields in my own build systems if I ever wanted Rust. That generalizes to other build systems.

The problem @dbuenzli is talking about isn’t using cargo (or dune or go or cabal/stack or …) to build the language it is intended to build (that that works shouldn’t be surprising ;), but combining with other languages. And yes, (almost) all of these language specific build system “know” how to build C libraries too, but even that is a PITA for anything a bit more complex (yes, yes, nobody needs to tell me that that works fine with Dune/cargo/…, just be glad if it works for you). So, for anything involving “other” languages (as in “real work”), in the long run it’s easier to use a “general” build system from the beginning, although it is more work (as in “less magic”) initially - even Make is one of those, even if (well, because) it is limited in it’s capabilities .

3 Likes

Unfortunately OCaml ecosystem is very small, and I see a lot of benefits from being able to leverage larger Rust ecosystem, especially given how easy this is done at code level. I don’t see how arguing about cargo/dune being fundamentally wrong helps bring this goal closer to reality though.

Have you considered putting your metadata at the findlib level rather than the opam level?

Actually no. I’m one of those irrational-casual-closed minds, and I’m very thankful that dune took a stab for me at working with findlib so that I can be totally unaware on how it works under the hood.

One (selfish) benefit of this approach is that I could use the new META fields in my own build systems if I ever wanted Rust. That generalizes to other build systems.

That sounds interesting, but I was also quite selfish and came up with a solution which fits into my happy dune path. As majority of opam packages are now dune-backed, I think that this should be a good starting point to gather more feedback. I’m open to improvements and compatibility with larger non-dune part of the commuinity though, just I don’t have enough experience to contribute in this direction.

Can findlib magic somehow help hook into linking phase of the target executable and build Rust staticlib behind the scenes? The necessity to have cargo bits in any project transitively depending on Rust is the most annoying part in my current approach, and if that could be automated - things should be much more smooth for the end users.

Well, use the build system (Dune) to do the building of the Rust staticlib. Findlib would be (if you used it) for finding the transitive closure and related metadata. I haven’t looked deeply into your project … how are you finding the transitive Rust dependencies that should be assembled into the Rust staticlib?

opam is a source-based package manager (mostly), so the source compiler (Rust) will always be necessary for any downstream package. But that doesn’t means that there couldn’t be an opam package that uses depext to install Rust, and for OS-es that have too old or no system Rust compiler: download and run rustup to an opam share directory, and set RUSTUP_HOME and CARGO_HOME for downstream packages (all of that is achievable within an opam file). I haven’t used Rust much, but the system requirements for OCaml (C compiler, linker and assembler) are a superset of Rust requirements (linker) so rustup should work. Windows rustup is harder since it needs MSVC or MSYS2, but those toolchains have good support in opam 2.2. Anyway, with effort it can be mostly transparent for opam users.

Another reading of what @ostera meant would be that millions of Rust devs are very happy about cargo. I am not sure the devs that ship things to prod with C think C is great.

2 Likes

Reading that again, I got actually interested in some numbers (for OCaml too, to not be totally off topic :wink::
I’ve found something of being about 26 millions of software developers in total world-wide, so let that be 30 millions. Certainly less than a third of them are C or C++ developers (as there are Javascript + Typescript, Python, Java, C#, Ruby, PHP, … devs too) - so about 10,000,000 C and C++ devs. Rust devs are certainly less than one tenth the number of C and C++, so at most 1 million. And OCaml should be at most again 1/10th of these, so 100,000.

But I’d guess Rust is more likely in the 1/20th range - so 500,000 - and OCaml too, so about 25,000 developers for OCaml.

1 Like

Whatever the many unverifiables claims that have now been made in this thread that started with a truncated quote of mine: they were not the point :–)

The point was: cargo and rust are not composable and interoperable systems. source: this thread.

Now for something that aims at taking the place that C holds now, I find it rather peculiar. Also I still think it’s worth pointing out the causes if we want these build experiences to improve.

On a broader perspective between these new language specific build/package system hydras which make uncomposable closed world assumptions and platform specific package management systems, I nowaday routinely deal with 6-7 different incompatible tools all basically doing the same thing with more or less grace. It’s not pleasant. So much for “developer experience”.

10 Likes