Good example of handwritten Lexer + Recursive Descent Parser?

Hi everyone,

I am looking for an idiomatic implementation of a Lexer + Recursive Descent Parser not making use of ocamllex, ocamlyacc or menhir. The kind you would write during the first chapters of Crafting Interpreters [0][1] in OCaml.

This Markdown parser [2] by @dbuenzli is a great example of what I am looking for. I’d be happy if you can recommend similar resources.

There are many OCaml repos for Lox interpreters but it’s hard to assess quality. And the readme often says “I am doing this to learn OCaml”, which doesn’t inspire confidence.

As a broader note, it would be nice to have (community vetted) OCaml translations of well-known learning material using mainstream languages. But I’ll raise the topic in another thread later.

Thanks in advance.

[0] Scanning · Crafting Interpreters
[1] Parsing Expressions · Crafting Interpreters
[2] cmarkit/src/cmarkit.ml at af8930c307957a546ea833bbdabda94a2fa60b4b · dbuenzli/cmarkit · GitHub

3 Likes

You might be interested in the book Compiling to Assembly from Scratch. There is a port to OCaml. It suggests the use of parser combinators.

Parser combinators is the same manual recursive descent method, but in a functional way. You can either use an existing library or you can write your own.

3 Likes

I am curious about your claim that parser combinators is the same approach as recursive descent parsing. This feels vaguely true but is there anywhere I can read more about this? Maybe a comparison?

odoc’s parser is half of what you’re asking for. It uses ocamllex for the lexical scan, because it’s very simple and convenient to do it that way, but the syntax is then analyzed by a hand-written recursive descent parser, in large part because that’s easier for the doc language.

An example of a non-ocamllex and non-ocamlyacc parser is Markup.ml (tokenizer, parser). But this isn’t a traditional recursive descent parser. Rather, it’s a pretty huge hand-written state machine in continuation-passing style, almost completely implementing the corresponding huge state machine specified in HTML5. But it’s the kind of code that fits well the topics of an impure FP class, especially since it has mutable cells for its continuations, that it uses to mimic effects.

1 Like

I’m not a big expert, but that’s what the original article or Wikipedia page says. Classical parser combinators are top-down parsers for LL(k) grammars, using the same strategy as recursive descent.

As an example, I have written a super toy JSON parser that uses the recursive descent method. It can be ported 1 to 1 to a monadic version that uses the same parsing strategy.

2 Likes

Thank you guys. I enjoyed reading these resources, it was super instructive.

In the spirit of introducing OCaml to a Java developer reading the Crafting Interpreters book. I have partially implemented the scanner here. It sticks to the book’s approach and function names, I am just stuffing the whole program state into a state record that I pass around explicitly.

I think this is very understandable and still showcases many strengths of OCaml. What do you think of this (boring) “OCaml from Java” approach as a way of onboarding new devs?