[ANN] Esperanto, when OCaml meets Cosmopolitan

I am delighted to present the first experimental release of Esperanto. This project is a new OCaml toolchain that creates binaries compiled with the Cosmopolitan C library and linked with the αcτµαlly pδrταblε εxεcµταblε link script. The binary produced is then portable to different platforms:

Linux Windows 10 MacOS FreeBSD OpenBSD NetBSD

The main objective of Esperanto is to provide a toolchain capable of producing a portable binary from an existing project. This would allow to finally be able to distribute software for all these platforms without having to:

  1. manage multiple platforms orthogonally, the Cosmopolitan C library offers you the POSIX API for all platforms (including Windows)
  2. Produce several versions of the same software for each platform. Only the binary is needed to run on all platforms

Cosmopolitan does not however produce a binary with a multi-platform assembler. At this stage, our distribution only supports the x86_64 assembler (the most common one) but we are working on the possibility to produce a binary with different assemblers.

I would like to give special thanks to Justine, the author of the Cosmopolitan project (to develop redbean, a small portable HTTP server) for her excellent work.

A toolchain

In OCaml, the “toolchain” principle allows the existence of several compilers within an OPAM switch and to choose one of them when it comes to cross-compiling a project. This principle, even though it is not clearly defined and even though its use remains very limited, exists through the ocamlfind tool.

You can find these toolchains in your switch:

$ ls $(opam var lib)/findlib.conf.d/
esperanto.conf solo5.conf

From our experience with Mirage as well as the work done in dune regarding cross-compilation, the choice to propose a new toolchain in order to allow cross-compilation of projects with OPAM is both a historical choice but also the most relevant one in our opinion1.

Why we need to cross-compile?

The term cross-compilation can be misunderstood if we only consider the question of the assembler issued by the compiler (does it match the host assembler or not). In our case, cross-compilation is a broader term that implies the use of external artefacts to the compiler that are different from the default and the use of compiler options that must be used throughout the production of the final binary.

In other words, even though we are emitting the same assembler, we are doing so in a different “context” which requires the definition of a new toolchain which includes our artefacts and compiler options.

One of these artefacts is of course the C library used by the compiler which will then be systematically used by the runtime caml, the well named libasmrun.a. This is why, for example, there is a version of OCaml with musl. So there must be a version of OCaml with Cosmopolitan.

This new toolchain also allows you to include the necessary options for compiling C files because, yes, you can compile a C file with ocamlopt.

In order to provide a coherent workflow for a project, we need to provide not only a libasmrun.a compiled with our Cosmopolitan C library but also an OCaml compiler capable of invoking the C compiler with the right options required by Cosmopolitan.

Finally, we also need to describe in this toolchain how to link the object files together to actually produce a portable binary using the APE script.

A simple example with this new toolchain

Installing Esperanto is very easy with OPAM. It will install the cross-compiler and the necessary files so that ocamlfind/dune can recognise this new toolchain:

$ opam install esperanto

Finally, let’s try to produce a simple binary that displays “Hello World!”:

$ cat >main.ml <<EOF
let () = print_endline "Hello World!"
EOF
$ ocamlfind -toolchain esperanto opt main.ml
$ objcopy -S -O binary a.out
$ file a.out
a.out: DOS/MBR boot sector

The binary produced can already be executed. However, there are still some issues that have been fixed since then but which are probably not yet integrated in your system. They concern zsh and binfmt_misc in particular.

The first problem with zsh is that it does not recognise the binary correctly. This problem has been fixed in the latest version of zsh.5.9.0.

$ zsh --version
zsh 5.8.1
$ zsh
$ ./a.out
zsh: exec format error: ./a.out

The second problem concerns binfmt_misc which intervenes upstream at the execution of your programs in order to choose how to execute them. In this case, binfmt_misc recognises Cosmopolitan binaries as Windows software by default.

Here too, a solution is available and described by the author of Cosmopolitan here: APE loader

Execution & Assimilation

If you are not concerned by the above problems, you can simply run the program:

$ ./a.out
Hello World!

There is a final solution that requires a little explanation of what αcτµαlly pδrταblε εxεcµταblε is. Indeed, the latter makes it possible to create a polyglot binary whose first point of entry is not your program but a small program which tries to recognize on which platform the binary tries to run.

After this recognition, this little program will “inject” values corresponding to the platform in which you try to run your program in order to finally let Cosmopolitan manage the translation between its interface and the real POSIX interface that your system offers.

Of course, this step has a cost as it adds an indirection between what your program wants to do and what is available on the system running your program. However, APE offers a very special option that allows the program to be assimilated to the platform in which it wants to run.

$ file a.out
a.out: DOS/MBR boot sector
$ sh -c "./a.out --assimilate"
$ file a.out
a.out: ELF 64-bit LSB executable, x86-64
$ ./a.out
Hello World!

This option makes your application truly native to the platform in which you run it. This means above all that the program is no longer portable.

Esperanto, dune & opam monorepo

The dune software also incorporates this toolchain idea using the -x option. More pragmatically, it is possible to define a new dune context to use Esperanto as a compilation toolchain.

However, the original aim of Esperanto is to produce a portable binary. This implies, among other things, that it should not depend on remaining artefacts in order to run and, in this sense, the compilation of your project should be a static compilation. This means that all dependencies of your project must be available to compile in the same context as your project.

Again, this is particularly necessary if any of your dependencies include C files, so they need to be compiled in some way.

This is where opam monorepo comes in, it will simply “vendor” your dependencies into a “duniverse” folder. Here are the steps needed to compile a project with Esperanto. We’ll take decompress as an example which produces a binary that can compress/decompress documents:

$ git clone https://github.com/mirage/decompress
$ cd decompress
$ cat >>bin/dune <<EOF
(rule
 (target decompress.com)
 (enabled_if
  (= %{context_name} esperanto))
 (mode promote)
 (deps decompress.exe)
 (action (run objcopy -S -O binary %{deps} %{target})))
EOF
$ cat >dune-workspace <<EOF
(lang dune 2.0)
(context (default))
(context
 (default
  (name esperanto)
  (toolchain esperanto)
  (merlin)
  (host default)))
$ opam monorepo lock --build-only
$ opam monorepo pull
$ dune build bin/decompress.com
$ sh -c "echo 'Hello World' | ./bin/decompress.com -d | ./bin/decompress.com"
Hello World

Issues

Apart from the outcomes described above, however, the Esperanto toolchain is not complete. Indeed, the OCaml distribution gives several libraries such as unix.cmxa and threads.cmxa. A little work has been done to make the former available. The second one is however unavailable for the moment since Cosmopolitan only partially implements pthread.

However, it seems that the author of Cosmopolotian wants to implement the rest of the pthread API which will then allow us to provide support for threads.cmxa and OCaml 5.

This of course makes support for the projects more limited than we imagined (and that’s why this release is experimental) however, an effort has already been made to lwt into Cosmopolitan’s hypothetical future support for pthread.

Future

As explained above, support for threads.cmxa and OCaml 5 remains the priority. however, an effort has already been made to support Lwt via Cosmopolitan’s hypothetical future support for pthread.

However, it is possible that Cosmopolitan could become a target for the MirageOS project in the same way as Solo5 (or our recent experiment on Raspberry Pi 4).

In this sense, we will surely propose an integration in MirageOS so that projects can both produce unikernels with Solo5 or portable binaries with Cosmopolitan.


1: However, the question remains open at several levels, that of the compiler, that of OPAM and of course that of dune. It is clear that the current situation is not the best in terms of what we need to do to produce such a cross-compiler. Only the feedback from Solo5 (which requires cross-compilation) allows us to say that it is surely the right choice for what we want to offer.

Conclusion

We hope that this project will facilitate the distribution of software. You can read a more technical article about our work here. Finally, I would like to thank robur.io (an association you can help) for allowing me to do this project.

EDIT: The author of Cosmopolitan just released Cosmopolitan with pthread support. So we will definitely try to improve our distribution to include OCaml with threads.cmxa support and move forward with OCaml 5!

33 Likes

This is a glorious piece of work - we can even easily write UEFI programs in OCaml :slightly_smiling_face: It’s also unusually refreshing in OCaml-world to have an application which runs without caveats on Windows, but needs a few side-notes for running on Linux :smirk::wink:

5 Likes