OCaml compiler development newsletter, issue 1: before May 2021

Hi Discuss,

I’m happy to introduce the first issue of the “OCaml compiler development newsletter”. I asked frequent contributors to the OCaml compiler codebase to write a small burb on what they have been doing recently, in the interest of sharing more information on what people are interested in, looking at and working on.

This is by no means exhaustive: many people didn’t end up having the time to write something, and it’s fine. But hopefully this can give a small window on development activity related to the OCaml compiler, structured differently from the endless stream of Pull Requests on the compiler codebase.

(This initiative is inspired by the excellent Multicore newsletter. Please don’t expect that it will be as polished or consistent :yo-yo: .)

Note:

  • Feel free of course to comment or ask questions, but I don’t know if the people who wrote a small blurb will be looking at the thread, so no promises.

  • If you have been working on the OCaml compiler and want to say something, please feel free to post! If you would like me to get in touch next time I prepare a newsletter issue (some random point in the future), please let me know by email at (gabriel.scherer at gmail).


@dra27 (David Allsopp)

Compiler relocation patches now exist. There’s still a few left to write, and they need splitting into reviewable PRs, but the core features are working. A compiler installation can be copied to a new location and still work, meaning that local switches in opam may in theory be renamed and, more importantly, we can cache previously-built compilers in an opam root to allow a new switch’s compiler to be a copy. This probably won’t be reviewed in time for 4.13, although it’s intended that once merged opam-repository will carry back-ports to earlier compilers.

A whole slew of scripting pain has lead to some possible patches to reduce the use of scripts in the compiler build to somewhat closer to none.

FlexDLL bootstrap has been completely overhauled, reducing build time considerably. This will be in 4.13 (#10135)

@nojb (Nicolás Ojeda Bär)

I am working on #10159, which enables debug information in -output-complete-exe binaries. It uses incbin under Unix-like system and some other method under Windows.

@gasche (Gabriel Scherer)

I worked on bringing more PRs to a decision (merge or close). The number of open PRs has gone from 220-ish to 180, which feels nice.

I have also contributed to @Ekdohibs’ project camlboot, which is a “bootstrap-free” implementation of OCaml able to compile the OCaml compiler itself. It currently targets OCaml 4.07 for various reasons. We were able to do a full build of the OCaml compiler, and check that the result produces bootstrap binaries that coincide with upstream bootstraps. This gives extremely strong confidence that the OCaml bootstrap is free from “trusting trust” attacks. For more details, see our draft paper.

with @Octachron (Florian Angeletti)

I worked with Florian Angeletti on deprecating certain command-line warning-specifier sequences, to avoid usability issues with (new in 4.12) warning names. Before -w -partial-match disables warning 4, but -w -partial is interpreted as the sequence w -p -w a -w r -w t -w i -w a -w l, most of which are ignored but -w a silences all warnings. Now multi-letter sequences of “unsigned” specifiers (-p is signed, a is unsigned) are deprecated. (We first deprecated all unsigned specifiers, but Leo White tested the result and remarked that -w A is common, so now we only warn on multi-letter sequences of unsigned specifiers.

I am working with @Octachron (Florian Angeletti) on grouping signature items when traversing module signatures. Some items are “ghost items” that are morally attached in a “main item”; the code mostly ignores this and this creates various bugs in corner cases. This is work that Florian started in September 2019 with #8929, to fix a bug in the reprinting of signatures. I only started reviewing in May-September 2020 and we decided to do sizeable changes, he split it in several smaller changes in January 2021 and we merged it in April 2021. Now we are looking are fixing other bugs with his code (#9774, #10385). Just this week Florian landed a nice PR fixing several distinct issues related to signature item grouping: #10401.

@xavierleroy (Xavier Leroy)

I fixed #10339, a mysterious crash on the new Macs with “Apple silicon”. This was due to a ARM (32 and 64 bits)-specific optimization of array bound checking, which was not taken into account by the platform-independent parts of the back-end, leading to incorrect liveness analysis and wrong register allocation. #10354 fixes this by informing the platform-independent parts of the back-end that some platform-specific instructions can raise. In passing, it refactors similar code that was duplicating platform-independent calculations (of which instructions are pure) in platform-dependent files.

I spent quality time with the Jenkins continuous integration system at Inria, integrating a new Mac Mini M1. For unknown reasons, Jenkins ran the CI script in x86-64 emulation mode, so we were building and testing an x86-64 version of OCaml instead of the intended ARM64 version. A bit of scripting later (8b1bc01c3) and voilà, arm64-macos is properly tested as part of our CI.

Currently, I’m reading the “safe points” proposal by Sadiq Jaffer (#10039) and the changes on top of this proposed by Damien Doligez. It’s a necessary step towards Multicore OCaml, so we really need to move forward on this one. It’s a nontrivial change involving a new static analysis and a number of tweaks in every code emitter, but things are starting to look good here.

@mshinwell (Mark Shinwell)

I did a first pass of review on the safe points PR (#10039) and significantly simplified the proposed backend changes. I’ve also been involved in discussions about a new function-level attribute to cause an error if safe points (including allocations) might exist within a function’s body, to make code that currently assumes this robust. There will be a design document for this coming in due course.

I fixed the random segfaults that were occurring on the RISC-V Inria CI worker (#10349).

In Flambda 2 land we spent two person-days debugging a problem relating to Infix_tag! We discovered that the code in OCaml 4.12 onwards for traversing GC roots in static data (“caml_globals”) is not correct if any of the roots are closures. This arises in part because the new compaction code (#9728) has a hidden invariant: it must not see any field of a static data root more than once (not even via an Infix_tag). As far as we know, these situations do not arise in the existing compiler, although we may propose a patch to guard against them. They arise with Flambda 2 because in order to compile statically-allocated inconstant closures (ones whose environment is partially or wholly computed at runtime) we register closures directly as global roots, so we can patch their environments later.

@garrigue (Jacques Garrigue)

I have been working on a number of PRs fixing bugs in the type system, which are now merged:

  • #10277 fixes a theoretical bug in the principality of GADT type inference (#10383 applies only in -principal mode)
  • #10308 fixes an interaction between local open in patterns and the new syntax for introducing existential type variables
  • #10322 is an internal change using a normal reference inside of a weak one for backtracking; the weak reference was an optimization when backtracking was a seldom used feature, and was not useful anymore
  • #10344 fixes a bug in the delaying of the evaluation of optional arguments
  • #10347 cleans up some code in the unification algorithm, after a strengthening of universal variable scoping
  • #10362 fixes a forgotten normalization in the type checking algorithm

Some are still in progress:

  • #10348 improves the way expansion is done during unification, to avoid some spurious GADT related ambiguity errors
  • #10364 changes the typing of the body of the cases of pattern-matchings, allowing to warn in some non-principal situations; it also uncovered a number of principality related bugs inside the the type-checker

Finally, I have worked with Takafumi Saikawa (@t6s) on making the representation of types closer to its logical meaning, by ensuring that one always manipulate a normalized view in #10337 (large change, evaluation in progress).

@let-def (Frédéric Bour)

For some time, I have been working on new approaches to generate error messages from a Menhir parser.

My goal at the beginning was to detect and produce a precise message for the ‘let ;’ situation:

let x = 5;
let y = 6
let z = 7

LR detects an error at the third ‘let’ which is technically correct, although we would like to point the user at the ‘;’ which might be the root cause of the error. This goal has been achieved, but the prototype is far from being ready for production.

The main idea to increase the expressiveness and maintainability of error context identification is to use a flavor of regular expressions.
The stack of a parser defines a prefix of a sentential form. Our regular expressions are matched against it. Internal details of the automaton does not leak (no reference to states), the regular language is defined by the grammar alone.
With appropriate tooling, specific situations can be captured by starting from a coarse expression and refining it to narrow down the interesting cases.

Now I am focusing on one specific point of the ‘error message’ development pipeline: improving the efficiency of ‘menhir --list-errors’.
This command is used to enumerate sentences that cover all erroneous situations (as defined by the LR grammar). On my computer and with the OCaml grammar, it takes a few minutes and quite a lot of RAM. Early results are encouraging and I hope to have a PR for Menhir soon. The performance improvement we are aiming for is to make the command almost real time for common grammars and to tackle bigger grammars by reducing the memory needs.
For instance, in the OCaml case, the runtime is down from 3 minutes to 2–3 seconds and memory consumption goes from a few GiB down to 200 MiB.

68 Likes

This is wonderful to read! Thanks all for sharing, and @gasche for organizing the sharing, and most of all for the excellent and sustained work. :tada:

1 Like

That is so nice to have, thank you @gasche and the ocaml team

1 Like

Cool work and fun read. I didn’t know about the diverse double compilation technique to twhart “trusting trust”, very nice and simple idea.

Something that is not clear to me (but I read quickly) is the impact of guile itself being not bootstrapped yet. Could there be a very elaborate attack (with probability 0 of existing) on both the guile and ocaml bootstrap or is there something in the whole scheme that prevents it ?

1 Like

Yes, currently Guile needs to be trusted, and it would be possible that a bootstrapping virus in Guile would break our correctness result. (It would need to reproduce itself through our compiler and interpreter that were written after Guile itself, but this could be done with an almost-infinitely-clever program analysis.) Of course, an attack at the source level (inserting malicious source, instead of malicious binaries) is also possible anywhere in the chain.
Our main reason for using Guile is that this is the high-level language community most active on debootstrapping-towards-the-metal (through the Guix connection), so we believe it is more likely to manage debootstrapping and maintain it in the longer run.

(The seed that Guile depends on is its macro-expander, which is written using macros itself. In theory one may perform the macro-expansion of the expander, and then manually review the two versions to verify the absence of attack there.)

1 Like

Thanks! Also just one comment, I read the paper on a monochrome display and I found using o in the diagrams to label OCaml sources to be confusing. E.g. o: ocamlopt. My brain always wanted to associate that to C object files. Maybe use ml ?

Thank you, this initiative is much, much appreciated!

In practice when I wanted to rename a switch I’d resort to a little hackish method involving:

  • a symlink to the old switch in .opam, carrying the new name,
  • and a renaming of the switch in .opam/config

this hadn’t broken on me yet, but I was fully expecting it to. Are there any cases you know of / can guess where it may? @dra27

1 Like

That doesn’t sound like a local switch? Did you start with opam switch create foo and then rename ./opam/foo to .opam/bar, symlink .opam/foo to point to bar and update .opam/config to have bar listed as a switch instead of foo?

For some reason @octachron’s contribution to the newsletter got lost in my pipeline. So below it is.

@octachron (Florian Angeletti)

  • With Sébastien, David, and Gabriel’s help, I have finally merged the
    change needed to integrate odoc in our documentation pipeline.
    Currently, this is hidden behind a configuration switch (or specific
    Makefile’s target).
    The user experience is still a bit rough, in particular it requires an
    trunk-updated version of odoc. Fortunately,
    the number of users right now is most probably of only one. My current
    plan is to see how well the maintenance goes during this release cycle
    before maybe switching to odoc for the 4.13.0 version of the manual.

  • I have been discussing with David about how much time and effort we
    should spend on testing the manual. (My opinion is that testing only the
    PR that alters the manual’s source file is essentially fine.) David has
    been testing more thorough configuration however but that requires some
    more tuning to avoid sending scary emails to innocent passersby.

8 Likes

Close, instead of renaming then symlinking, I kept the old name and created a symlink carrying the new name:
opam switch create foo
ln -sr .opam/foo .opam/bar
sed -i 's/"foo"/"bar"/' ← for demonstration, in reality I edited names manually because I believed a sed -i may match too much.

Local switches are created with opam switch create path (usually just opam switch create . in your project directory) and store the switch in _opam in that directory. However the switch breaks if you rename the directory containing _opam (which is a more forgivable thing to do than messing around in ~/.opam :wink:). You could do a similar renaming trick with local switches, except that as with your example, you’re not really renaming anything - it works because the original path still exists.

The relocation patches allow the compiler to work correctly with an actual mv or, more usefully for opam, they allow you to cp -a the entire compiler to create a new, distinct compiler installation.

2 Likes

Thank you @gasche This is fantastic reading! OCaml development switching to github, now this, good job!

Thank you @gasche. It is very nice to be able to follow the development of the compiler.
I take this opportunity to ask if there is some work on unboxing constructor?

I can’t remember precisely, but I read a while ago a PR that would allow to write something like:
Zarith: type t = Small of int [@unboxed] | Big of mpz

Yes! I’m planning to write about it in the next issue :slight_smile:

(You are thinking of Jeremy Yallop’s RFC, there is no PR yet.)

@recoules the new issue is out, with description of joint work with @nchataing on this issue.