Readable ML (OCaml, SML, etc.) compilers?

unfode · October 26, 2023, 7:43pm

What would you recommend to someone who only took a compiler course in college?

It doesn’t have to be full-featured.

fermin · October 26, 2023, 8:25pm

“Modern Compiler Implementation in ML", by Andrew Appel is a good choice. (Modern Compiler Implementation in ML)

nojb · October 26, 2023, 8:50pm

Check out MinCaml GitHub - esumii/min-caml: moved from https://sourceforge.net/p/min-caml/code/, and the paper: https://esumii.github.io/min-caml/paper.pdf.

Cheers,
Nicolas

yallop · October 26, 2023, 9:35pm

For readability, I recommend Andreas Rossberg’s HaMLet, which implements Standard ML in a way that follows the structure of the Definition [pdf] rule-by-rule.

For example, here’s the rule for typing handle clauses (the equivalent of OCaml’s try ... with):

    | elabExp D (C, HANDLEExp(exp, match)@@A) =
      (* [Rule 10] *)
      let
        val tau       = elabExp D (C, exp)
        val tau_match = elabMatch D (C, match)
      in
        Type.unify(Type.fromFunType(InitialStaticEnv.tauExn, tau), tau_match)
          handle Type.Unify =>
            error(loc A, "type mismatch in handler");
        tau
      end

If you compare this with the corresponding rule in the Definition:

handle

then you’ll find a clear correspondence between the parts:

elaborating exp in context C produces a type tau
elaborating match in context C produces another type tau_match
actually tau_match should be equal to exn -> tau, so enforce that using unification (and raise a type error if it turns out that they’re not actually equal)
the result of elaborating the whole phrase (exp handle match) is tau

contificate · October 29, 2023, 5:08pm

I’ve always found the original Caml Light sources very readable (and, yet, not cited very often). The entire project itself is incredibly practical; the lexer generator is straightforward (some of the comments, citing chapters of the dragon book, persist in the ocamllex implementation to this day!), the parser generator is a modification of byacc, the type inference algorithm is effectively Algorithm J (w/ Rémy’s levels), etc.

Of course, there are still many concepts in the compiler where you’d need some background. For example, the general ideas behind the Hindley-Milner implementation (destructive unification with Rémy’s levels) are covered in a few notable places (“Le Language Caml” (French), “The Functional Approach to Programming”, and, more accessibly, Oleg Kiselyov’s blog article on the topic). That said, once you know these concepts, they’re very easy to identify in the Caml Light implementation.

If your intention is to implement a compiler for a language similar to SML or OCaml, there is a wealth of different resources (books, papers, etc.) that are useful for different parts of the compiler. You just need to mentally delineate the necessary steps (parsing, type checking, normalisation, match compilation, closure conversion, hoisting, further lowering, instruction selection, register allocation, etc.), then seek out resources about them. Many of the front-end and later back-end stages are documented in classical compiler textbooks, but various ideas around functional (middle-end) intermediate representations and their transformations exist in more obscure literature (e.g. Appel’s “Compiling with Continuations”, Tarditi’s PhD thesis, various papers by OCaml contributors and their academic collaborators, papers around MLton, SML/NJ, TIL, MLj, Manticore, MLRISC, sml2c, MLtoAda, etc.).

I’ve also found MLton’s Compiler Overview pages to be rather useful as documentation of many of the IRs, transformations, and ideas implemented in MLton.

So, if you get to the point of requiring supplementary resources, you should ask around. There’s no single textbook, paper, or codebase that would give you a full picture of the rabbit holes in compiler engineering.

NickBarnes · October 31, 2023, 11:48am

The MLWorks system has a compiler which is IMO fairly readable, though probably not as much as HaMLet. The typechecker at least is structured very closely around the Definition.

Topic		Replies	Views
Looking for a list of small (or not?) miniML dialects Community	5	617	August 11, 2022
Writing type inference algorithms in OCaml Learning	17	3637	August 15, 2021
An AST typing problem Learning	7	2910	November 2, 2020
OCaml compiler development newsletter, issue 6: March 2022 to September 2022 Community compiler-newsletter	13	3799	November 17, 2022
OCaml compiler development newsletter, issue 5: November 2021 to February 2022 Community compiler-newsletter	0	1656	March 3, 2022

Readable ML (OCaml, SML, etc.) compilers?

Related topics