Easy way to build and distribute OCaml executables for all the popular platforms?

Eventually, I would like some solution to this problem:

  • build (preferably) statically-linked executables for multiple architectures and operating systems
  • publish binary packages for the popular package managers

Even for opam users, it would be nice to have binary installations of executables that have no runtime dependencies (static builds) because it’s faster to install and doesn’t depend on specific versions of source packages. For example, a static build of the atdpy executable would have no runtime dependency at all (both itself and the code it generates). It’s really just one executable file. It would be great to have an easy way to build such binaries and make them available via pip since it’s what Python users use. A source installation with opam would be too bothersome for them.

What have you used for this and how satisfied are you?

10 Likes

I’m in the middle of building an OCaml-based installer generator; an introduction is at dkml-install-api and the API is at dkml-install-api/odoc. The first target will be to generate a standalone installer for OCaml on Windows. However, the installer generator will be usable by any OCaml maintainer for their own Windows/macOS/Linux software. With that in mind, I’ll respond to a few things you mentioned:

  • Easy way to build and distribute OCaml executable …. If you mean just distributing a single executable, I’m sure you’ve already come across the Generating static and portable executables with OCaml post by Louis Gesbert at OCamlPro. It might not be easy, but at least it is formulaic. But you may need to do more than just distribute a single binary. For the Windows installer we need Visual Studio, Git and a Unix shell installed, an OCaml system compiled in-place on the end-user machine, a system PATH tweak, some binaries and at least one working Opam switch with preconfigured Opam repositories.

  • build (preferably) statically-linked executables. Much has been written elsewhere about static linking vs dynamic linking. I lean towards dynamic linking because security auditing is much easier for the end-user, and dynamic linking is often required to comply with fellow maintainers’ software licenses that don’t have static linking exceptions. So … and I think this will interest you @mjambon … I’ve been using the Python’s PEP-driven “manylinux” standard in-house. Basically, it is a spec for a conforming Python distribution to have a set of shared libraries (ex. libc, libglib2) with known minimum versions on Linux, and test environments (Docker containers) to make sure any Python binary extensions you develop can be distributed on all the major Linux distributions. The manylinux Docker containers can catch Linux distribution gaps in C libraries used in some OCaml packages (ex. capnproto#1414) and is fairly easy to add to CI (ex. capnproto#1415). When Linux becomes generally available in dkml-install-api, I’m expecting to re-use these manylinux Docker containers to build portable dynamically-linked Linux binaries.

  • publish binary packages for the popular package managers … build such binaries and make them available via pip. Both system package managers and language package managers are being discussed. I’ll clarify my language: the installer generator I’m building is design to install standalone software on the end-user system (sometimes that will involve system package managers like Homebrew, yum, apt) but uses a language package (opam) to build the installer itself. In fact Opam is quite capable of downloading binaries, and dkml-install-api makes heavy use of that feature. See the .opam file in dkml-component-curl for an example where a binary executable generic/bin/curl.exe is installed that is either a) a download of curl.exe on an end-user Windows machine or b) symlink to the system curl executable on an end-user macOS/Linux machine. // I think it is a straightforward extension to do the local dev machine experience similar to pip install xyz, but someone else would need to focus on that.

Although dkml-install-api is not yet released, if you or anyone else is interested in contributing or nudging the development, file an issue / send a PR / message me on Discuss.

Good luck, Jonah

7 Likes

Some more possibilities for distributing applications: flatpaks, snaps, appimage.
They have their downsides though:

If your application is not a GUI I’d recommend shipping it as a Docker container though. That should be fairly portable to run on most distributions, and even on distributions that do not have Docker available by default: they have podman which does a pretty good job at running them (there is also now an OCI standard for containers).

And aside from a Docker container as packages built on the release you are shipping for. Although building on an older version and using it on a newer one might in some limited cases, I’ve run into trouble with that in the past when upgrading between minor CentOS 7.x versions (there was an ABI breakage in libnss, but CentOS has purposefully overriden the soname and claimed there was no ABI change, but of course things broke at runtime when built on old NSS and run on new, solution was to rebuild the application requiring the newer NSS as a minimum dependency).

Building and shipping individual executables without a package manager and without containers is likely to run into problems eventually (especially as newer distros get released). There was also an older standard called Linux LSB (and you could configure your compiler to be LSB-compliant, i.e. only make headers/libraries/symbols available that the LSB defines). However in practice there were some crucial symbols missing there, and not many distros are actually LSB compliant (so e.g. trying to build and ship something like nginx using LSB didn’t quite work).

You could use a service such as openbuildservice.org that supports building packages for multiple distributions (both rpm and deb based): openSUSE:Build Service supported build targets - openSUSE Wiki

Another possibility is to define a list of distros you support and use containers to build a package for each distribution (this is actually fairly simple to automate and parallelize and if your package is small building it for all distros will be done in no time, especially if you use Docker layer caching effectively to preinstall your build dependencies in a layer).

Nix closures might also work, but they’re a bit heavy (I think you would end up shipping everything including a libc, which might be older than what the user has on their system), although in the end Docker images are the same, except the libc version you get would be more “well known”.

Although pip might seem like a nice solution, in practice I’ve run into various compatibility issues especially with libraries related to numpy, opencv, ghostscript. I ended up either having to install distro packages for some, and use pip prebuilt packages for the rest, or forcing pip to rebuild all packages and not use binary packages. This is especially problematic when a new distro comes out, e.g. with Python 3.9->3.10 version bump and nothing on pip supports it quite yet. None of the py 3.9 prebuilt binaries would’ve worked in that environment. Here is one such example where pip binary packages go wrong (especially as various pieces of the python ecosystem upgraded their numpy dependencies at different rates): Install numpy-1.20.0rc1 causing errors · Issue #534 · dask/fastparquet · GitHub

3 Likes

This project by @jchavarri and @spyder is building atdgen binaries for 3 platforms using github CI. It might be a good starting point.

A hackish solution but that might be good enough given that the resulting program is meant to be used by developers, is to compile atdpy with js_of_ocaml. It used to be possible for atdgen, no reason that it would have stopped. You’d end up with a single js file that is easy to distribute and can run anywhere there’s nodejs.

3 Likes