Looking for an OCaml parser in ANTLR or JFlex

Hello,
I’m looking for some grammar files to parse OCaml code with ANTLR or JFlex.

The final goal is to add support for the language to an open source editor for Android (and then build an app, as I did for iOS recently).

Help is welcomed to make this project real!

Add some links that can be useful:

https://askra.de/software/ocaml-doc/4.02/full-grammar.html
https://ocaml.org/manual/language.html

There is a Tree-Sitter grammar for OCaml, not sure if it might be useful for you: GitHub - tree-sitter/tree-sitter-ocaml: OCaml grammar for tree-sitter

Thanks for your answer, I knew about it but as Tree-sitter is its own parsing engine that isn’t compatible with ANTLR, I didn’t mention it here.

I just found out bnfc:

The BNF Converter (bnfc) is a compiler construction tool generating a compiler front-end from a Labelled BNF grammar.

It can generate an ANTLR grammar for you given a BNF description of OCaml’s syntax.
The first link you gave may be a good starting point but it’s not uptodate (it’s for OCaml 4.02). Maybe an updated version exists somewhere ?

Note that the language reference in the manual is split into two chapters: The OCaml language and Language extensions.
Also, the parser for reference: lexer.mll and parser.mly

1 Like

I’m going to take a deeper look this weekend, if I can write a BNF description.

The OCaml grammar in the compiler distribution is written for Menhir. I wonder what it would take to postprocess it to produce a grammar suitable for ocamlyacc? B/c in the past I’ve taken yacc grammars for other languages (e.g. Golang … bletch) and “ported” them to work with ocamlyacc; perhaps the other direction is feasible? the lexer is written for ocamllex; perhaps it can be massaged to be usable with flex? [that’ll be harder – I know that some of the lexemes are scanned with code instead of a regex.

And then, maybe a LALR grammar could be massaged to produce an LL(k) grammar?

I don’t know how feasible this course is. Of course, if it’s at all possible to use a Java Yacc instead of ANTLR, that might make things much easier. As I said, I’ve “ported” Yacc grammars from one Yacc to another. Sure, you have to rip out all the actions and put in new ones, but it’s still far, far superior to having to figure out a deterministic grammar from an ambiguous BNF.