Readable ML (OCaml, SML, etc.) compilers?

What would you recommend to someone who only took a compiler course in college?

It doesn’t have to be full-featured.

2 Likes

“Modern Compiler Implementation in ML", by Andrew Appel is a good choice. (Modern Compiler Implementation in ML)

8 Likes

Check out MinCaml GitHub - esumii/min-caml: moved from https://sourceforge.net/p/min-caml/code/, and the paper: https://esumii.github.io/min-caml/paper.pdf.

Cheers,
Nicolas

5 Likes

For readability, I recommend Andreas Rossberg’s HaMLet, which implements Standard ML in a way that follows the structure of the Definition [pdf] rule-by-rule.

For example, here’s the rule for typing handle clauses (the equivalent of OCaml’s try ... with):

    | elabExp D (C, HANDLEExp(exp, match)@@A) =
      (* [Rule 10] *)
      let
        val tau       = elabExp D (C, exp)
        val tau_match = elabMatch D (C, match)
      in
        Type.unify(Type.fromFunType(InitialStaticEnv.tauExn, tau), tau_match)
          handle Type.Unify =>
            error(loc A, "type mismatch in handler");
        tau
      end

If you compare this with the corresponding rule in the Definition:

handle

then you’ll find a clear correspondence between the parts:

  • elaborating exp in context C produces a type tau
  • elaborating match in context C produces another type tau_match
  • actually tau_match should be equal to exn -> tau, so enforce that using unification (and raise a type error if it turns out that they’re not actually equal)
  • the result of elaborating the whole phrase (exp handle match) is tau
11 Likes

I’ve always found the original Caml Light sources very readable (and, yet, not cited very often). The entire project itself is incredibly practical; the lexer generator is straightforward (some of the comments, citing chapters of the dragon book, persist in the ocamllex implementation to this day!), the parser generator is a modification of byacc, the type inference algorithm is effectively Algorithm J (w/ Rémy’s levels), etc.

Of course, there are still many concepts in the compiler where you’d need some background. For example, the general ideas behind the Hindley-Milner implementation (destructive unification with Rémy’s levels) are covered in a few notable places (“Le Language Caml” (French), “The Functional Approach to Programming”, and, more accessibly, Oleg Kiselyov’s blog article on the topic). That said, once you know these concepts, they’re very easy to identify in the Caml Light implementation.

If your intention is to implement a compiler for a language similar to SML or OCaml, there is a wealth of different resources (books, papers, etc.) that are useful for different parts of the compiler. You just need to mentally delineate the necessary steps (parsing, type checking, normalisation, match compilation, closure conversion, hoisting, further lowering, instruction selection, register allocation, etc.), then seek out resources about them. Many of the front-end and later back-end stages are documented in classical compiler textbooks, but various ideas around functional (middle-end) intermediate representations and their transformations exist in more obscure literature (e.g. Appel’s “Compiling with Continuations”, Tarditi’s PhD thesis, various papers by OCaml contributors and their academic collaborators, papers around MLton, SML/NJ, TIL, MLj, Manticore, MLRISC, sml2c, MLtoAda, etc.).

I’ve also found MLton’s Compiler Overview pages to be rather useful as documentation of many of the IRs, transformations, and ideas implemented in MLton.

So, if you get to the point of requiring supplementary resources, you should ask around. There’s no single textbook, paper, or codebase that would give you a full picture of the rabbit holes in compiler engineering.

7 Likes

The MLWorks system has a compiler which is IMO fairly readable, though probably not as much as HaMLet. The typechecker at least is structured very closely around the Definition.

2 Likes