Digging into bytecode compilation for the first time. Several questions

I started playing with ocamlc today, which I have mostly ignored until now, either preferring to use ocamlopt (via dune, mostly), or executing small programs directly with ocaml.

The first thing I tried to do was disassemble the bytecode, which I did not succeed in doing. I was directed to use ocamldumpobj, which unfortunately is not in my path. (I have OCaml installed from opam, not from OS packages). It was mentioned that ocamldumpobj is not always installed, and I may need to find dumpobj in the OCaml tools directory. I searched my opam installation and found a dumpobj.ml, but no executables.

So the first question is, what is the path forward for disassembling OCaml bytecode?


Next up, portability.

Like many Linux users, I keep user config files in git, along with the contents of my ~/bin directory for short useful scripts I like to carry around with me.

Also like many Linux users, I have a fancy shell prompt that gives various bits of useful information in fancy colors. In my case, this prompt is generated by an OCaml program (with a Python fallback in the ~/bin directory if I haven’t compiled the OCaml prompt on the given system yet).

Thinking about these portable bytecode binaries, I thought perhaps I could dispense with my Python fallback prompt and instead put compiled OCaml bytecode executables into my ~/bin directory.

I discovered two possible issues, one of which appears to be solvable and the other which may not be. The first is that the path to the runocaml executable is hard coded in the shebang. The first line of my compiled output:

#!/home/ninjaaron/.opam/5.3.0+flambda/bin/ocamlrun

Obviously this is not very portable. It contains my username and the path to a specific switch which may or may not exist on other machines.

I looked for a way to change this at compile time to something like #!/usr/bin/env ocamlrun, which I didn’t find, but if absolutely necessary, it would be trivial to automate changing this with sed or something.

It then occurred to me that the bytecode may not be stable between OCaml releases—and I soon discovered it’s actually worse than that. I have switches for 5.3.0 installed with and without flambda, and bytecode is not even compatible between these installations because the magic number is different.

So I guess my second question is, is there any way to have portable OCaml executables that are either a) not dependent on the architecture and libc (via ocamlopt) or b) not dependent on the details of the local OCaml environment (via ocamlc)?

Is it better to stick with my Python fallback? (My OCaml prompt script depends on a non-trival library which deals with launching and communicating with processes, so executing it directly with ocaml is probably not a great option.)

If none of this is performance sensitive you can simply do

#!/usr/bin/env ocaml

let () = print_endline "hello"

If it becomes the things like ocamlscript or b0caml (unreleased though) may help.

1 Like

This specific case bit performance sensitive, unfortunately, since the script executes every time my prompt renders and even a 100ms lag is pretty annyoing in this situation. The prompt script itself is short, but it links against some non-trival libraries which take a little more time to compile. (and ultimately the unix library is in there somewhere as well)

Here are some links that may be helpful:

1 Like

This unfortunately won’t create an executable that is compatible with my Raspberry Pi. ocamlc does that, but it also assumes the OCaml environment is the same on the system where I compile and the system where I execute.

Anyone have any thoughts on the first question, how to disassemble OCaml bytecode when there’s no ocamldumpobj executable produced by my opam switch?

Maybe esperanto can solve your second issue?

2 Likes

Another possibility that I actually use is Esperanto. It’s a new OCaml toolchain (the README.md explains how to build a project with dune) based on the great Cosmopolitan project which gives to you the ability to craft a native “actually portable” executable. It works only for OCaml 4.14 but I have some plans to support OCaml 5.

2 Likes

Quick update, I found a way to read disassembled bytecode in RWO:

You can display the bytecode instructions in textual form via -dinstr.

Works great!

A lot of this is addressed by “Relocatable OCaml”, for which there may even by an RFC published by the end of the week. In slightly more detail:

  • When configured to, the embedded path of the shebang is removed - in addition to a PATH-search, the launcher prioritises an interpreter found in the same directory as the image (i.e. if you compiled your bytecode executable to be relocatable, and put both it and ocamlrun in /usr/local/bin, it will Just Work™, regardless of PATH)
  • In order to mitigate the chaos which this could cause between multiple versions of OCaml, Relocatable OCaml also implements a (relatively) simply name-mangling scheme which means that the bytecode executable’s launcher searches for an ocamlrun of the required version (i.e. in your example it will require an OCaml 5.3 version of ocamlrun) and, having found that, that interpreter will only load shared objects (e.g. the support stubs for the Unix library) for the same platform and configuration as the interpreter. For example, if you have configured OCaml 5.3 with the uniform float arrays, that runtime will not attempt to load bytecode stubs which were compiled for a runtime which supports flat float arrays.

I have experimented before with Relocatable OCaml and a custom header for bytecode using an αpε-style executable launcher (this apes the trick used by the linker in cosmopolitan). This allowed me to create a single bytecode executable which executes directly on Linux or Windows (i.e. it was physically the same binary - you obviously need to have the OCaml runtime installed on the system for it to be interpreted). However, this approach quickly runs into problems in bytecode, because many libraries (starting with OCaml’s own Unix library) do too many things at compile-time which should be deferred to runtime (TL;DR if you want truly portable bytecode, you must use the same OCaml sources for all platforms - so the fact that unix.ml is implemented differently between Unix and Windows kills true portability of the bytecode image).

2 Likes

This “Relocatable OCaml” is interesting for me personally. I’m less interested in being portable to Windows, and more interested that things will be portable on different POSIX systems and architectures.

In practice, this is almost always Linux for me, though I keep meaning to try out the BSDs and I’ve been flirting with the idea of getting a Mac for music production, so it is pleasant if my scripts can also work on those systems (one of the reasons I always ensure my shell scripts are POSIX-compliant and just switch to Python as soon as the standard sh won’t cut it.)

It’s still kind of a bummer that the same version of OCaml needs to be installed on both systems to get it to work, but I do understand why that is.


Maybe I’m thinking about the solution wrong. There’s a kind of stigma about keeping binary data in Git anyway (which I totally understand, even though people store binary data in Git all the time). Perhaps I should just keep the OCaml source file with my configs and install a script that compiles it at login if there is no executable available (and there is an ocaml compiler available).

This doesn’t really eliminate the need for a fallback prompt but it might at least remove the manual step of compiling my prompt on new systems once I get around to installing OCaml.

If you want portability for an executable, have you tried something like AppImage or do you think it’s going to add to much startup time?

I haven’t tried AppImage. It seems like a very “heavy” solution for installing a prompt, but perhaps it would work well in practice. However, unless I’m mistaken, AppImage is also architecture dependent. I think it mainly solves dynamic linking issues—but this is just my second-hand understanding, so take it with grain of salt.

Honestly, the current setup of having a fallback script in Python* works acceptably well. I just started playing with ocamlc, for which one of the stated goals is portable executables, and I was wondering if this could mean that I might be able to rewrite some of my portable tooling in OCaml, but from this thread it seems that this is still much less straightforward than simply using Python or Perl, unfortunately—and moreover, most of the proposed solutions have nothing to do with bytecode compilation.

*a language I’m not particularly fond of, but with a runtime which is pre-installed in most of my environments and will soon be installed if it isn’t, since I use it for a lot of my trivial tools.

My conclusion is that, if portability was the main goal, ocamlc is kind of a half-baked solution. It frees you from worrying about architecture, libc, and OS (with caveats), and instead allows you to worry about replicating the same OCaml setup on every system. I suppose this is easy to manage with ansible or containers or Nix, but I haven’t gotten as far as automating my personal machine setup in this way yet—but once you get to the point of replicating the same OCaml environment on every system, I guess it becomes tenable to simply recompile OCaml utilities on every machine.

You are not alone. Some years ago I had to give up using unison to synchronise files because it was too much trouble to have the exact same ocaml version used in every machines I intended to use it (unison uses the stdlib marshaller to serialize its meta data from one peer to the other thus this requirement). I though at the time that a bytecode unison would be easy to copy around, but had to give up on this idea.

1 Like

Note that a couple of years ago Unison switched to its own marshalling format and solved the incompatibility problem. All versions since that release are compatible with each other.

2 Likes

Using OCaml marshalling as a serialization format over the network is borderline scandalous for a number of reasons.

2 Likes