Modern alternative to inline Camlp4/5 parsers

I used to write inline parsers like this using Camlp4 (now Camlp5, I think):

open Camlp4.PreCast

let expr = Gram.Entry.mk "expr"
let defn = Gram.Entry.mk "defn"
let prog = Gram.Entry.mk "prog"

EXTEND Gram
  expr:
  [ [ "if"; p = expr; "then"; t = expr; "else"; f = expr ->
        If(p, t, f) ]
  | [ e1 = expr; "<="; e2 = expr -> BinOp(`Leq, e1, e2) ]
  | [ e1 = expr; "+"; e2 = expr -> BinOp(`Add, e1, e2)
    | e1 = expr; "-"; e2 = expr -> BinOp(`Sub, e1, e2) ]
  | [ f = expr; x = expr -> Apply(f, x) ]
  | [ v = LIDENT -> Var v
    | n = INT -> Int(int_of_string n)
    | "("; e = expr; ")" -> e ] ];
  defn:
  [ [ "let"; "rec"; f = LIDENT; x = LIDENT; "="; body = expr ->
        LetRec(f, x, body) ] ];
  prog:
  [ [ defns = LIST0 defn; "do"; run = expr -> defns, run ] ];
END

This was one of the best things about OCaml. For example, see LLVM: A native-code compiler for MiniML in ~100LOC (2007).

I’m just trying to get back into it but, since then, Camlp4 was renamed to Camlp5 and a new Camlp4 replaced that but was dropped in 2017 in favor of PPX but PPX only seems to do a tiny fraction of what Camlp4 did and, in particular, I cannot find any information about how to write inline parsers using PPX.

So, how do I write inline parsers in modern OCaml?

1 Like

Camlp4 is still maintained and is easily available through OPAM.

You are totally right that PPX is a less powerful solution, but at the moment there is no proper successor to Camlp4 (which was dropped for good reasons).

As for inline parsers, consider parsec-style combinators, which have certain downsides, but are good enough for many use cases.

It should be possible to build a PPX that does much of what camlp4 permitted in the example above, should it not?

I’ve discussed with several people the fact that nothing prevents parser generators from being presented as ppx preprocessors. We could have a mode of Merlin that fits inside extensions, and is processed by a ppx. One “just” need someone to do this work, and no one bothered thus far – but I know it is at least also on @let-def’s radar.

I appreciate the real usability benefits of writing one’s parser non-invasively inside a ML module (and the long-term benefit of using a standard build scheme instead of parser-generator-specific rules). But I think that the benefits I get from using Menhir to write my grammars outweigh the cost of having a separate file with reduced OCaml tooling, so I prefer it to “inline” solutions.

2 Likes

I think if you’re writing the parser for something complicated, having a distinct Menhir file is not much of a burden. If you want to write a parser for a very small thing (say your config file format), being able to throw the lexer and parser inline might be pleasant. Of course, one would have to experiment with it a bit first I think.

1 Like