Relocatable compiler work

For some time now I have gotten the impression that some of the main tooling issues in the OCaml ecosystem are caused by the lack of a ‘relocatable compiler’. E.g. Opam local switches seem pretty heavy-weight - #3 by Armael

However when I look for OCaml repo issues related to that I find two issues which are marked as closed:

One can argue that we already do have that with the little-known opam-bin tool, but if we look at dev meeting minutes it is agreed that it it’s too early to integrate it into opam due to security/reproducibility concerns, and that more people should try it out and report their findings. Of course, it’s a chicken-and-egg problem because I don’t think most people even know about it in the first place.

Interestingly, in the second issue above @gasche suggests:

  1. I would like to experiment with what happens, in an opam world, if you use #!/usr/bin/env ocamlrun; this should work as long as the right opam/$switch/bin is in your path, and it may be the simplest solution.

Which seems to me the simple and obvious solution to the problem, especially in a world where we’re acclimatized to running eval $(opam env) (usually automatically) to ensure that our environments are up-to-date. So I’m left wondering, would there be any issue with shipping this simple and obvious fix while we wait for the full-fledged relocatable compiler work to land?

I believe the latest update on this work was shared by David Allsopp at ICFP this past year. Here’s the recording of the talk I’m thinking about: [OCaml'22] Copying opam switches – it should Just Work™ - YouTube

Thanks. So the problem is two-fold:

  • How to find ocamlrun. The simple solution seems to be using #!/usr/bin/env ocamlrun to find it in the $PATH.
  • How to find the standard library. There seems to already be an environment variable OCAMLLIB for this; if opam env would set this variable then it should automatically work?

Yes, this is the work that @dra27 presented at the OCaml Workshop 2022. Yes, there are no PRs open or anything visible from the outside (but maybe some of @dra27’s current PRs are actually buildup work for that), so it feels a bit like vaporware. I ping @dra27 about it from time to time, I think that the topic is not buried (unlike, say, the idea of upstreaming some of the JIT work, which is probably buried), and he says that he will submit the work upstream someday.

If you find the current state strange, my short explanation is that there are very few people working on this and that they are all busy with many other things. (A way to solve this problem is to have more people working on this. Are people around interested in contributing?)

More concretely:

  • we were all busy with the Multicore merge for about a year and upstream progress on other topics has been frozen upstream during the “sequential freeze”

  • now that things are heating up again, there is a shortage of reviewers/maintainers so some authors of “big patches” are waiting a bit for us collectively to find a good way to move forward. If I were @dra27 I would have hesitated to send a big PR between the OCaml workshop and now.

  • @dra27 is also working (often in collaboration with other people), off the top of my head, on:

    • improving the configuration and build system of the OCaml compiler (which helps for: relocatability, cross-compilation, supporting more hardware or OSes, etc.)
    • improving Windows support for OCaml 5.0 (he’s the reason we have a Windows port at all, now working on a MSVC version I believe)
    • maintenance task of general interest on the OCaml compiler
    • opam
    • helping maintain the opam-repository, the ocurrent CI, ocaml docker images etc.
    • ensuring that as many packages as possible are working well with the upcoming OCaml release
    • probably some management within OCamllabs/Tarides

    None of these things are clearly much less important for OCaml than relocatability. So personally I just roll with the flow and wait for whatever @dra27 decides to release/submit/do next.

Almost all of the things above could probably benefit from more people volunteering to help move them forward, which would be great for that thing and also free time for people currently involved to move on some of the other things faster.

4 Likes

Thank you for the details! Clearly, there is much to be done, and few hands to do it. What I’m driving at is, would there be interest in landing an interim solution that relies on compiler-opam coordination, but should be much easier to roll out, while we wait for David’s work to land and fix the issue in a nice and clean way?

Edit: actually before that, would my suggestion actually fix the issue (albeit clumsily)?

The 2021 version of relocatable was a prototype and quite reasonably wore the label of vaporware, but I’m pleased to say that everything presented at ICFP was (of course) very real. My development is rarely in private - anyone particularly interested to follow the workings of it, and the madness of my internal thoughts, can see my worklog for relocatable, various draft PRs on my OCaml fork (these exist to trigger CI runs on the individual branches), and the somewhat terrifying shell script which does the rebasing.

At present, I’ve got myself into the rather silly situation of two things being technically blocked on me - relocatable competes with getting the Windows parts of opam 2.2 finished, and I’m trying to focus on that at this precise moment (in addition to the list @gasche kindly assembled above :slightly_smiling_face:).

As I noted just before the Christmas holidays in What are the biggest reasons newcomers give up on OCaml? - #73 by dra27, I’m intending to have the PRs opened in enough time so that there’s a chance to include it in 5.1, which means ideally the PRs should start appearing in late February. I’m not opening the PRs in bits until I have a complete “vertical” solution (including the opam packaging), because what’s happened at various points is that changes in a later PR require something to be tweaked in an earlier branch… this is why the various parts of it are all stacked together. As demonstrated on my fork, I also intend to propose that these be backported or possibly even back-released for 4.08+, so when relocatable lands, it will land for a considerable number of compilers at once. That increases the need for the patches to be battle-tested and utterly complete in advance.

This proposal makes opam switches less stable, not more so. At present, when you have bytecode executables built with two different switches, they are guaranteed to execute the correct runtime, but may (and do) pick up the wrong C stubs. Switching to using env as the first mechanism extends this instability to the selection of the runtime (relocatable uses env as a fallback mechanism, and uses name mangling to eliminate the risk of picking up the wrong runtime). I (attempted to) cover this in Slide 8 of my workshop talk (“Solution Part 2: Finding the right things”).

The failure mode of this is really bad - relying on environment variables to do this definitely decreases the stability of opam switches because we know from the bug reports, discussions, etc. that programs do get called with the incorrect environment. opam-repository already does a similar trick for CAML_LD_LIBRARY_PATH and it’s dreadful (see Only set CAML_LD_LIBRARY_PATH for system switches · Issue #16406 · ocaml/opam-repository · GitHub) - relocatable is meant to do away this kind of messing around with environment variables.

1 Like

I don’t think they do, I’m afraid! The original case for relocatable in November 2019 was actually inspired by real-life errors arising from trying to use environment variable tricks like this - the fact that it allows for fast switches came later! I’ve just pushed the slides from that talk to Github - but the solution proposed there differs from the version presented last year. The three problems presented at the start of those slides were all real and came from actual failures which had to be debugged (and not just for beginners).

3 Likes

@dra27 so, playing the role of the accomodating radio host here, if people wanted to contribute to your relocatable stuff (in small ways, not “I get inside David’s brain and finish the 80% of polish bits missing before the PR”), what could they do?

Your work log has some potential answers, there are many TODOs in there. The Future work section has small stuff that I think could be reasonable suggestions, for example:

  • -use-runtime should actually conflcit with -custom (this extends a fix in long-shebangs a bit further)
  • caml_parse_ld_conf appears to have a totally broken read loop - there’s no check that it actually read the whole file and EINTR is not handled?! The latter may not matter, but shouldn’t we ensure that we’ve read to EOF?!

Do you have other suggestions that would actually help sooner / shorten the critical path, and are still reasonably easy to understand and pick up for a motivated outsider?

1 Like

Thank you for explaining that–it is indeed quite tricky. The first few slides of this PDF would make an excellent motivating GitHub issue, imho. Some notes on the examples problems:

  1. The reason given is:

CAML_LD_LIBRARY_PATH was set for an opam switch

Does this mean that because opam set this variable, camlp4 couldn’t load a shared library from a non-opam OCaml installation? In that case shouldn’t our recommendation be to use a properly configured opam switch to run camlp4 as that is officially supported by the OCaml Platform?

  1. Not a question, just tried out this one and noticed that unfortunately opam doesn’t recalculate the correct paths for all the opam env variables when a local switch is moved; it only calculates the OPAM_SWITCH_PREFIX correctly. I have filed a bug report.

  2. I actually tried this one (ocamlc -o hello -custom hello.ml) and it worked but only after manually setting the opam env variables by interpolating the $OPAM_SWITCH_PREFIX value as I suggested in my report. Without the -custom flag we get the bad interpreter error as it tries to look in the hardcoded old directory (which as I suggested above could potentially be solved by using #!/usr/bin/env ocamlrun but only after the opam env issue would be fixed).

In any case, I agree that your work is probably closer to landing than anything else.

Yes, that’s right - the problem with the recommendation is that opam also allows a certain level of interoperation with “system” packages. OCaml 5 is making this slightly “worse”, as we have an increase in bytecode-only architectures. The problem (entirely in my opinion!) with recommendations is that people often then don’t follow them… the aim with all of this has been to ensure that we don’t need recommendations, because the only way available is the right way :slightly_smiling_face:

1 Like

The big “TODO” list at the start may well be out of date (the perils of a work log…); the check-lists in the sections for individual branches are not out of date, though. For example, in 2019 it looked feasible to push this back to 3.12 potentially, but at this point I have no intention of back-porting past 4.08 and I don’t think it would be worth anyone trying. The “Future work” was where I was dumping things I’d spotted along the way which I think would be worth fixing at some point, but they’re certainly not on the critical path and others may not agree that they want fixing!

There are two things which could be very profitably investigated separately, because they could just use the existing back-port branches (e.g. the backport-4.14 branch). The first would be to investigate the changes needed in both ocamlbuild and dune to support the new -suffixed option for ocamklib. In relocatable, instead of installing dllunix.so, the compiler now installs something like x86_64-pc-linux-gnu-dllunix-aaab.so and the idea is that other build systems which generate C stub libraries would use the same naming convention.

The other thing would be to investigate and potentially propose patches for Dune’s caching system. From early experimentation, I don’t think that Dune’s caching shares the compilation artefacts between switches (because the path differs) but it would be interesting both to verify that (if necessary) and see what could be done to allow Dune to understand that an artefact compiled by ~/.opam/4.14-switch1/bin/ocamlopt can be reused instead of calling ~/.opam/4.14-switch2/bin/ocamlopt if ~/.opam/4.14-switch1/bin/ocamlopt and ~/.opam/4.14-switch2/bin/ocamlopt are identical (which, with relocatable, they are).

Two of the root PRs exe-executing and empty-env unusually have had the PR description written before the code itself, and could be being picked up. They’re both quite Windows-y, though. Similarly, unified-header I have only started, but unless one is willing to do simultaneous Windows/Unix development it’s not for faint-hearted!

The individual branches (which are “local PRs” on my fork) all need rebasing on to latest trunk and then the branches need reassembling … in theory, the instructions (and rr-cache commits) in dra27/relocatable should allow the work to be reassembled. That’s a somewhat tedious piece of work, but the payback would be familiarity with the changes when it comes to reviewing it (it’s the first thing I’ll be doing when I pick this back).

2 Likes