Compiler reproducibility and OCaml

silene · June 19, 2026, 6:21am

Your guess is your own. One of the repositories has branches that are able to bootstrap OCaml 4.09.1 and 4.12.0, while the other repository only supports 4.07.1.

Sure. And you can even go further than just auditing. You can mathematically prove that it is correct. (Myreen did so for LISP on Arm, PowerPC, and x86 about 20 years ago.)

That gets us closer, and as you said, the next step to further increase the trust is to look at the silicon. This becomes an interesting challenge: How do you design a chip simple enough so that it can be visually reviewed by a human, but fast enough to run a modern compiler? I suppose an FPGA would fit the bill, since despite its huge size, its regularity would make it auditable by a human. But I don’t remember ever reading about an FPGA that could flash itself (hence allowing bootstrap of the chip), so I suppose it is still an open problem (either that or people do not care).

conroj · June 19, 2026, 11:23am

Based on the unresolved status of this issue, I think the work on versions after 4.07 is probably unfinished, but I’d be happy to be wrong about that.

hyphenrf · June 19, 2026, 12:00pm

Isn’t guile bootstrappable all the way down to mes thanks to the monumental effort of guix people? Like I believe this is exactly why camlboot chose guile to begin with.

conroj · June 19, 2026, 12:42pm

Based on this blog post and page 8 of the camlboot paper, I think debootstrapping Guile is still work in progress.

Mes has successfully debootstrapped other things, but I believe those things are all C projects (Mes is actually two “mutually self-hosting” tools in one: a Scheme interpreter and a “C” compiler.) The best explanation I’ve seen for how this works is here: starting from a trivial assembler (a 357-byte binary seed) and a trivial “C” compiler, one can bootstrap the Mes Scheme interpreter, and from that, the Mes “C” compiler. Then you can bootstrap increasingly sophisticated C compilers.

gasche · June 20, 2026, 10:13am

Note: I would make a distinction between “reproductibility”, which I view as the ability to build equivalent binaries of a program from different build setups, and “debootstrapping”, which is the more demanding task of trying to build the software without relying on opaque, generated binaries (or other form of non-human-reviewable data). Some people in the OCaml community (and outside of course) worked on both aspects, but reproductibility is easier to achieve and relevant to many practical situations (eg. whether binary caches can be trusted), debootstrapping is harder, and also a more niche interest in the overall community, so it is useful to distinguish the two.

(Disclaimer: I’m one of the people who worked on Camlboot.)

Yes, currently the known-to-work-well version of Camlboot is the 4.07 one. We have started experiments to support more recent OCaml versions but they are mostly unfinished. (It is a bit tedious, and no one has set this as one of their priorities in life.)
(The linik was already given but I would recommend the Camlboot paper as a source of documentation/explanations about Camlboot.)

I think that Camlboot solves a security problem (for OCaml 4.07), which is to ensure that the bootstrap (bytecode) binaries distributed with the OCaml compiler are not malicious – were not directly tinkered to include a backdoor. We didn’t think they would be, but then the development processes around the bootstrap binares are not very rigorous from a security perspective, they mostly rely on trusting people, so it is good to check this for sure from time to time.
As @silene point out, an attack could come from other parts of the ecosystem, but the part that I feel co-responsible for and wanted to check were specifically the bootstrap binaries.

Regarding Scheme (Guile) versus C: I prefer to use Scheme to implement programming languages, it is more fun to write a compiler in Scheme than in C. I don’t think that Scheme is harder than C to debootstrap, and the sociological fact that the deboostrapping people are in the Guix community, which is also full of Guile hackers, makes me confident in considering that over time Guile is a good language to use for deboostrapping. In fact, I would now even prefer to use OCaml 4.07 than Guile to implement a new programming language, so I think future languages could use OCaml if they wish, although that makes the dependency chain longer than using C or Scheme directly.

raphael-proust · June 22, 2026, 7:45am

yep it could interfere. but to do what?

ok so now you’ve detected that you are a tainted compiler that’s detected you are compiling a compiler for another language. you don’t know what interface this compiler has with the world, you don’t know how it gets its inputs, you don’t what its inputs look like, you don’t know how to recognise an interesting payload to poison… like how do you detect you are compiling login or some other interesting program to hijack?

if all you acheive is “all compilers have this little bit of self-replicating code inside of them” then i guess that’s an acheivement. but the thought experiment is all scary because it adds a bug in the login program which is used for security. the self-replicating part is not specifically scary on its own.

conroj · June 22, 2026, 10:34am

I think this argument makes sense if one assumes that the attack is meant to be indiscriminate. But taking the recent example of the XZ utils backdoor (not a trusting trust attack, but another supply chain attack), there’s some fair evidence that the attacker targeted one specific type of system and tailored their payload accordingly.

Topic		Replies	Views
OCaml compiler development newsletter, issue 1: before May 2021 Community compiler , news , compiler-newsletter	15	3666	June 4, 2021
OCaml compiler design and development Learning compiler	18	5096	May 25, 2020
A line-by-line translation of the OCaml runtime from C to Rust Learning vibecoding	21	4169	June 20, 2026
Is there ever a time when it makes sense to choose Go over OCaml? Ecosystem	129	14351	February 9, 2023
[ANN] BAP 2.1.0 Release Community announce , bap	26	4739	September 22, 2020

Compiler reproducibility and OCaml

Related topics