OCaml compiler design and development

As suggested in OCaml 4.11, release plan, I open a distinct topic about the OCaml compiler.

1/ What are the first things to study (documents, papers, files…) to get a true overview of the architecture of the OCaml compiler and then to deeply understand its internals?
Is there a document that explain how its design and development are organized?

In this topic Documentation about the ocaml compiler? I found this interesting entry point https://github.com/ocaml/ocaml/blob/trunk/HACKING.adoc referenced by @Drup to start playing with the compiler source code. And there is also https://ocamlverse.github.io/content/compiler.html referenced by @bluddy.
It’s a bunch a documents. Which one studying first to get the big picture before starting to play with some fragments of this very large program? (with the short/medium/long terms goals to understand, suggest and participate to the OCaml compiler design and development, respectively).

2/ What are the kind (sorts) of bugs done by programmers during the development and what are their root causes?
How far could they be addressed by formal methods? (after all, one son of OCaml is Coq, so…)

3/ Beyond declaring issues on https://github.com/ocaml/ocaml and fixing them as usual in OSS, is there a defined design process, especially with formal/semi-formal requirements, and is there a steering committee?

4/ Concerning, possible OCaml major features that has been discussed here and there in the past such as typed effects , modular implicits, etc. what is the situation? (requested feature, motivation of this feature, expected benefits)
Is metaOCaml planned to be incorporated in a future Ocaml distribution instead of being a set of patches (currently still to the OCaml 4.07.1 distribution)?

I’m not waiting for a “cake ready to eat” possibly accompanied with a long recipe. This discussion may be organized interactively, starting with the more relevant pointers that experimented people should know, from which I could make a feedback.

Thanks.

3 Likes

You can start from Memory Representation of Values - Real World OCaml in Real World OCaml, and move onto the remaining chapters in Part III to get this overview.

1 Like

In fact, these are the only chapters from RWO I haven’t read in this book.
Thanks for reminding them!

It also reminds me Alex Clemmer’s post http://blog.nullspace.io/beginners-guide-to-ocaml-beginners-guides.html where he wrote nice things about RWO and especially about the technical information: “A whole section on the runtime. A whole section!” (you made me find this old link).

Thanks.

Hi,

I thought I would share some of my experiences having just completed some work in customising the compiler, this mainly seeks to give some advice/thoughts for your first point on understanding the compiler, as that’s all I feel qualified to answer.

A little background: I was customising the compiler to emit custom instructions for RISC-V - this meant having a fairly good grasp of the overall structure of the compiler pipeline (although not the nitty-gritty details of each transformation).

Understanding the OCaml pipeline:
The first thing I did, thanks to @dra27’s advice, was to think of a new keyword (for me it was camel) which was like a unary operator for simply adding one to a given number (or 2 in the OCaml tagged-bit representation). In the end I pushed this through the compiler to the bytecode-level. It gave me a good understanding of the general structure of the front and middle-end as you pass from intermediate representation (IR) to IR. The marvels of static type-checking mean this is almost road-mapped for you as once you’ve added the new keyword lots of pattern-matches become non-exhaustive and your job is simply to add an extra-case (but to also understand what each is doing).

As part of my CS degree we did a Compiler Construction course (using OCaml) which constructs a very simple (mostly interpreted) language called Slang (https://github.com/Timothy-G-Griffin/cc_cl_cam_ac_uk/tree/master/slang). Looking at the source-code for this might be a more digestible way of understanding the tools like the parser and lexer (.mly and .mll) than jumping in at the deep-end with the OCaml ones.

I think these posts about some of the OCaml internals are great to explain some of the design decisions in terms of data format (https://rwmj.wordpress.com/2009/08/04/ocaml-internals/) as well as the aforementioned Real World OCaml “The Compiler and Runtime System” section. The OCaml manual, which is very thorough, also has a section about interfacing with C which is somewhat useful for understanding the data representations and using some of the runtime code to make this work (https://caml.inria.fr/pub/docs/manual-ocaml/intfc.html).

The compiler itself has the option of printing different IRs (parsetree, typetree, lambda, clambda, cmm, mach, linear…) using the -d<IR_name> whenever you run ocamlopt. This is a great tool for (1) analysing the different IRs (how data is represented etc.) and (2) understanding what high-level abstractions are being removed from IR to IR.

As a small example compiling the following with ocamlopt -dclambda -dcmm -c:

let a  =  3 

Returns

clambda:
(seq (setfield_ptr(root-init) 0 (read_symbol camlTest) 3) 0a)

cmm:
(data)
(data int 1792 global "camlTest" "camlTest": int 1)
(data
 global "camlTest__gc_roots"
 "camlTest__gc_roots":
 addr "camlTest"
 int 0)
(function camlTest__entry () (store val(root-init) "camlTest" 7) 1a)

(data)

There’s quite a lot going on here, but one thing you can note from this is the translation from the original 3 value to the OCaml 7 value (3 << 1 + 1). This is a very small example, but outlines a general method for understanding how OCaml represents the source-code at different levels of abstraction.

The asmcomp directory contains the code for generating assembly (the back-end of the compiler). I think writing small programs and passing the -S compiler flag and analysing the generated assembly is incredibly useful, especially if you then try tracking back up the compiler pipeline to see where the assembly came from (one good example would be the jump tables used for larger pattern matches).

AFAIK, there’s no single document which describes everything. This is why playing with the compiler is so useful, but can be intimidating/hard to decide what small piece you want to understand first. What I’ve tried to show are a few ways of learning about individual parts so you can start building up the bigger picture which has worked for me so far.

These are just some of the things that helped me come to grips with the compiler, hopefully it can help you too.

Summary:

  • Adding a custom keyword that does something very simple.
  • Writing a toy-language (say the Lambda Calculus) using OCaml tools really helped me understand the front-end of the compiler.
  • Printing lots of programs using the different IRs (ocamlopt -c -dclambda... myfile.ml)
  • Outputting the generated assembly and inspecting that file before working back through the compiler to understand how your source-code became that assembly (ocamlopt -S myfile.ml)
6 Likes

I’m happy that the OCaml compiler daily and reliably delivers executable files from OCaml expressions.
But I was staring at the compiler as a black box really wondering whether I could start to visit this cathedral. Reading directly the source files without a big picture in mind seemed to be a hard and long journey.

Thank you for your very valuable and operational explanation.

PS: the -dparstree, -dtypedtree,… options are not in man ocamlopt (I first read) and only appear as (undocumented), and sorted by name, when sending and incorrect ocamlopt command.

Note: HACKING.adoc and CONTRIBUTING.adoc in the compiler sources are living document. If there are important things that are missing from them, we can add them. For example, there is a section on the compiler overall structure (which lists the subdirectories, mostly), which can be completed if people want to add more information. (We also want to keep the whole thing reasonably short and readable, so some things may be better branched to sub-files, but you get the idea.)

1 Like

You should have a look at the bugs, or some of the bugs, fixed in the last release. There are of various kinds.

Some bugs could be prevented by certified programming in the OCaml source of the compiler, or the standard library; this sounds like it would be a lot of work. For starters, there are no formal-verification tools for OCaml programs today. (Should you port parts of the compiler in Why3, or prove them correct using CFML, or write a formal verifier for static contracts/assertions in OCaml programs? In any case, years of work ahead.)

Some bugs are in the runtime, and some of them could possibly be fixed by using a program verifier for C, or by moving parts of the runtime to a safer language (maybe Rust, but then the dependency chain would be huge, so this may be rejected for maintenability/portability reasons). Again, years of work. Some FFI bugs could be caught by a dedicated verification tool, which is less work than general program proof; see the previous work on the Saffire FFI checker, which is sadly unused today.

I believe that many bugs could be avoided by “better software engineering”, which is the easiest formal method around: making the code nicer in the first place, using types to enforce more static properties, etc. For example, for about a year I’ve been working with @trefis to refactor the pattern-matching compiler : #9321. (It’s not strictly about removing bugs, but rather about trying to make it more maintenable.). @octachron has been doing great work reviewing our PRs in the last couple months, so that we were able to merge more than half of it upstream in the compiler. (And @octachron got to learn a lot about how the pattern-matching compiler works in the process.) If anyone wants to contribute to the reviewing, and learn things about how the compiler works, they are more than welcome to jump in and help in the review process!

Last time I asked Oleg (the author and maintainer) a couple years ago, he preferred to not have BER MetaOCaml upstreamed, because he was still unsure about some design decisions. Other people may know more about the current situation.

Have you considered helping update to a more recent version of OCaml? That sounds like a great way to learn a lot about the implementation in a short amount of time. (I would guess that the easiest way to do it is to migrate one version at a time: 4.08, then 4.09, then 4.10.)

MetaOcaml integration was discussed with Oleg few months ago, and he was mostly interested in having syntactic hooks in the compiler to make porting MetaOcaml easier rather than a full blown integration.

1 Like

Thanks for all these details.

As an Ocaml beginner, I remember some readings about an error in Scanf module that has been solved (maybe it’s this article: https://www.ocamlpro.com/2015/04/13/yes-ocp-memprof-scanf/). And about the not type safe Marshal module, however that can be used in a safe manner.

As an ordinary OCaml programmer, my main concern are:

  • can my program crash? (at least I can run it again, hoping that’s it not a reproducible crash)
  • if it doesn’t crash, and if does terminate, is its result correct?

Should I understand that errors in OCaml code base (AKA “bugs”) mainly cause program crashes or incorrect results? (I know, it’s an open and difficult question).
EDIT: considering that “Well-typed programs do not go wrong”. (R. Milner).

In 2014, @talex5 published an interesting and sincere “gallery of OCaml/Lwt/Curl/GTK bugs”: https://roscidus.com/blog/blog/2014/01/07/ocaml-the-bugs-so-far/ .
Yes, ---- happens.
So, driving carefully with a good car (active security), and fasting our seat belt and having airbags (passive security) should be our first concern before dreaming of a futuristic car.

Concerning applying the available formal methods&tools to the OCaml compiler itself, I see the benefits and I suppose it would be hard to change what is being running for a long time (as usual in many domains).
I can just ask another question:
On the long term, does someone think it’s feasible/realistic to gradually integrate more formal stuff into OCaml engineering process? Or is that kind of transformation such a cultural and technical breaking change that it’s intrinsically a quantum leap?

Rust has a very awesome book that explains all the internals of the rust compiler (well, the goal is to explain all the internals, it’s still a work in progress).
There’s also a lot of incentive for the community to participate in the development such as ICE-breaker teams, that help fix medium-priority bugs, and which are non-engaging.

In general, there’s a lot of nice things done by the Rust community which could inspire in part the OCaml community.

However, in balance, OCaml is much more stable than Rust thanks to its decades of existence (and in my opinion its much much simpler semantics…). So such things are maybe not as useful.
Also, maintaining a dev guide + community teams is a HUGE task. Like really really big.
And to be honest, I find the OCaml team very efficient, I’ve never encountered a bug that blocked anything, I never feel like I’m waiting for a feature to finally come up like I often see people doing in the Rust community or JS community.

Anyway, all that to say: it would be great to have a dev guide for people interested, but in balance it might be a little too much work for the compiler team

1 Like

The question on the type of bugs in the compiler distribution is interesting. Some languages communities have a language implementation that is not robust enough, and the result is that when something goes wrong, users have to deal with the unpleasant doubt: “maybe it is a compiler bug?”. In my experience (but people would be welcome to propose different experiences), this is not the case in OCaml: it is so rare that a misbehavior can be tracked to a compiler defect, rather than a program defect, that people never have to work with the (rather painful) lingering doubt that it may be a compiler problem. (Sometimes it is! This is like CPU bugs: it never is a bug in your CPU, even though sometimes it is in fact a bug in your CPU.)

Looking at the Bug fixes section of the current state of the 4.11 release, 21 issues are reported. I did a quick experiment of clustering them in general categories:

  • I would classify 8 issues as about “user interface glitches” from the compiler or tools. (For example, exception backtraces being printed differently in native and bytecode mode in some cases.) In general a lot of development effort goes into the user interface of the compiler tools and runtime (error messages, collaborating well with the build system, debugging, etc.)
  • 7 bugs only happen in what I would call “weird environments” that are less tested. One bug would crash all programs linked against the musl libc on an architecture different from amd64. There was a Makefile variable naming conflict that affects Gentoo users. Etc.
  • 4 bugs result in weird static checking (program rejected when it could be accepted, or sometimes the other way around) for “weird programs” that use complex type-system features in less-well-understood ways. Three of those four involved recently-introduced features (empty variant type, anonymous modules, type 'a t := foo).
  • finally, 2 bugfixes could possibly affect the execution of a “typical” OCaml program (it does not rely on bleeding-edge or arcane features). But in both cases users would recognize that their program was not well-written in the first place, instead of blaming the language implementation.

The two changes in the final category were:

  • if you write a record-expression { e with x = foo; y = bar } but you list all the fields (in this example the record has no field other than x and y), then e would be silently ignored instead of being evaluated. Weird, but not shocking either, and it’s probably a better idea to remove e with altogether to make your program unsurprising.
  • #7683: if you do let r = ref 0 in f (incr r) !r (incr r), the order of evaluation of the increments and references would be different with the byecode and native code compiler. It is possible that a user would test their program only in native code, and then later realize that it crashes on bytecode due to this difference. (Or flambda vs. non-flambda; flambda behaved as bytecode here.) I think this is the 4.11 “bug” that is most likely to bite users in practice, yet most code authors would probably avoid doing side-effects in argument in this way, and not blame the implementation if the result is surprising. (In fact either behavior were correct with respect to the language specification, but we made a change to ensure only one of the behaviors happens, to avoid surprises.)
3 Likes

There is no steering committee, but the people making integration decisions are the maintainers of the compiler distribution (which evolves in a fairly similar way to “who has commit bits” in most open source project: from time to time we invite active contributors to join), with Xavier Leroy playing the role of a benevolent-grumpy dictator.

@avsm recently gave a good description of (some approximation of) a language design process:

(I say “approximation” because for now RFCs are experimental; they are mostly used as a tool for language maintainers themselves, and it’s not working very well. Personally I wouldn’t encourage people to rely on it before we get better at handling RFCs; and even then, we would have problem if we had more RFCs submitted than people doing the work of reviewing and giving feedback – which is the state of the implementation right now.)

If you want to learn to swim in the OCaml compiler distribution codebase, I think that the best approach is to pick some task related to it, and try to do it. (I suggested for example working on upgrading BER MetaOCaml to a more recent version of OCaml, based on your remark on the fact that it was stuck to an older version; but really any task would do.) If you manage to finish your task, great. If you don’t, you will still have learned a lot in the process, so it will be a success.

One caveat: I would not recommend trying to implement a new syntax or feature, because those are rarely accepted (fortunately). It is easier to work on something for which there is clear consensus that we want it, such as fixing a bug, upgrading some piece of software, etc.

Finally: there are plenty of other interesting and/or important codebases in the OCaml ecosystem, that are looking for contributors – not just the compiler distribution.

3 Likes

Is this something the portion of the community outside of the compiler maintainers can help with? I’m happy that the RFC process is being tried, as it’s helpful and interesting to see some of the design and thought process that goes into new compiler features.

It’s also unclear where/when community members who are not going to be directly involved in the implementation should get involved in the RFC process. Do you have a sense yet of what you’d like the balance to be, so that the feedback is helpful but doesn’t get in the way of implementors’ discussions?

Yes of course, but as you point out it all depends on the signal/noise ratio. Some people in the community have excellent feedback to give, or excellent change proposals to design and propose to implement. There is also some less-useful feedback that counter-productively increases the mental load. So far we have seen very little of the second category, but nobody knows how it will go. Languages that have a very open RFC process (C#, Python, Rust) have a hard time dealing with the signal/noise ratio and it consumes a lot of energy from people that have more workforce than we do (but also they have more signal and more noise).

My personal impression is that it is too early to put high hopes in, or too much pressure on, the RFC process. Ideally I would hope that it gets traction among core/frequent contributors first (irrespectively of whether people have commit rights or not), before we try to broaden the scope. On the other hand, more help reviewing PRs is always (needed and) welcome, and that process is always well established and understood. (Thanks @hcarty for your reviewing help on stdlib PRs!)

2 Likes

Your categorization is very instructive.

Thanks for offering us more visibility. That clarifies (at least for me) different things, and that gives concrete ways about how to start becoming a little bit more an actor instead of staying a consumer. Such as reviewing PRs, or upgrading BER MetaOCaml, which I didn’t authorize myself to simply imagine participating. I feel that there is a lot to learn there. It’s encouraging.

I see that ocaml.4.04.2 sends a Warning 23 (“all the fields are explicitly listed in this record: the ‘with’ clause is useless.”) and doesn’t evaluate correctly, while ocaml.4.07.1 warns and evaluates correctly the expression { e with x = foo; y = bar }.

I understand and believe that main troubles can be prevented with “good/relevant” programming style and software engineering practices (as well as organization&collaboration…).
It’s not easy to define what is a “good/relevant” programming style.

Your words inspire me a few rules of thumb:

  • “Follow a programming style that avoids exotic expressions”.
    Possible criterion: does this expression looks like it’s playing with some language boundaries? (still informal, but can produce some hints)
  • “Stick to this programming style; reuse as much as possible previously experimented expressions”.
  • “Specify as much as possible” (e.g. : module signatures, of course ; and also pre-conditions/ invariants/post-conditions; there is a lot there to avoid putting a program in bad situation - it’s a huge domain)
  • “Define exhaustive tests and execute these tests, as automatically as possible” (testing has one advantage: it forces specifying before programming)

That’s obviously not enough because many requirements could also be discussed about how collaboration should be organized, the level of automation, etc. I don’t intend to dictate how people should work (!). I just wanted to share what your answers inspired me about my practice.
Thanks.

I concur (this is how I learned it as well). A way to have a direct discussion with someone knowledgeable is also very useful. Not everything is well documented and you will get stuck at some point. Having someone walk you through it can be very helpful. There are many compiler people on IRC/Discord who can answer questions.

As complement to @gasche’s remarks, Another very good idea is to improve error reporting. The difficulty of implementation varies from very easy to very difficult and it has a fairly high chance of being merged.

2 Likes

As complement to @gasche’s remarks, Another very good idea is to improve error reporting. The difficulty of implementation varies from very easy to very difficult and it has a fairly high chance of being merged.

Yes! And @Drup and @octachron happen to have an error-reporting PR right now that is stuck by lack of feedback: Semantic diffings for functor types and applications #9331. Anyone willing to look at pieces of this? (This is a big patch, but looking at just a couple commits would already be helpful.)

3 Likes