Can 'Error: Syntax error' be improved?

yawaramin · March 18, 2025, 8:36pm

Eg:

# let class = 1;;
      ^^^^^
Error: Syntax error

How difficult would it be to add some explanation here, eg Error: Syntax error: expected a pattern, got 'class'?

nojb · March 18, 2025, 9:15pm

Not trivial: Research

Cheers,
Nicolas

yawaramin · March 18, 2025, 10:51pm

Thanks. Looking at: https://def.lakaban.net/research/2024-LRGrep-dissertation.pdf

It says:

However, only some rules can utilise the error symbol due to the rapid introduction of ambiguities. When multiple error clauses apply to a single situation, which one should prevail? As a result, most cases are not covered, and the parser defaults to a generic “Syntax error” message.

Well, what would happen if the first error clause was always chosen? For example, what would be reported for my code snippet given above?

Chet_Murthy · March 18, 2025, 11:18pm

There’s a different answer: predictive parsers do this, right? And LR parsers are not predictive.

welltypedwitch · March 19, 2025, 12:21am

Aside from engineering effort, is there anything standing in the way of OCaml using something more error message friendly like recursive descent instead of an LR parser?

yawaramin · March 19, 2025, 1:58am

If a file has the code:

let class = 1

And the parser reads the characters let , it knows that the next expected token must be a pattern, right? EDIT: or the keyword open

nojb · March 19, 2025, 6:52am

The CAML system (which preceded Caml Light, which preceded OCaml) had a recursive descent parser; see the Appendix of the CAML reference manual.

In the paper that introduced the system that would become Caml Light and then OCaml, @xavierleroy had this to say about the parser (see Section 5.2.1, page 55 and 56):

Parsing ML is not easy: there is a huge number of possible ambiguities, since most constructs are “open” (i.e. without closing delimitor), and even more so if one wants to provide the same amount of syntactic sugar provided by CAML, for compatibility reasons. The conventional approach is to multiply the nonterminals of the grammar (e.g. having several entry points for expressions, corresponding to expressions with various priorities), in order to resolve these ambiguities in the grammar definition itself. This leads to very convoluted and hard to read grammars; the actual grammar of CAML [62, appendix Z] is a very convincing example of this tendency!

Yannick Martel and I dismissed this approach and chose instead to write a very natural grammar, but highly ambiguous, and rely on Yacc’s disambiguation mechanisms (precedences, associativities, etc.) to get correct parsing. The result is a concise grammar (about five times smaller than the corresponding fragment of CAML grammar), fairly easy to read (even for a non-specialist like me), but with some 50 reduce/reduce conflicts and 500 shift/reduce conflicts! As it is impossible to check each conflict to see if it was resolved correctly, it is not possible to know exactly what the resulting parser does, and getting a parser that works correctly on test CAML programs required quite a bit of trial and error… Yacc is partly responsible for this situation, since the conflict resport and disambiguation tools it provides are fairly crude. For instance, precedence and associativities are not taken into account to solve reduce/reduce conflicts. And to understand where the conflicts lie, one has to look at the generated automata, which requires training and fortitude. We conclude that common parser generator technology is not yet powerful enough to easily tackle languages with such a tricky syntax as CAML.

Since then, this decision has not been revisited in the main system as far as I know (other than to switch to Menhir a few years ago). The Camlp4 and Camlp5 systems had a reimplementation of the parser using recursive descent technology that was maintained in parallel, but they had their own complexity issues and they are not actively developed anymore.

In summary: writing a recursive descent parser for OCaml is technically possible, but a considerable amount of work and no-one has actually stepped forward to experiment with such an approach. Even if successful, the resulting parser is likely to be of considerable complexity and have the same drawbacks that motivated the switch to LR parsers in the first place… On the other hand, it could allow for better error reporting; but it is not clear if this would be enough to justify a switch back to recursive descent.

Cheers,
Nicolas

octachron · March 19, 2025, 9:00am

Moreover, any effort taken to write a recursive descent parser would be probably more efficiently spent taking the time to apply menhir and LRgrep improved error reporting to the existing parser. I know this is the todo list of few people (me at the very least).

Chet_Murthy · March 19, 2025, 3:40pm

I maintain the Camlp5 grammar that is the descendant of the LL parser you describe, and I can attest to the truth in your summary. And you’re 100% right, that the switch back to LL would not be justified.

What -might- justify such a switch, would be switching to using ANTLR, with much more lookahead. But that’d be an enormous change, not least since it would require either using ANTLR’s Java code, or porting that to OCaml.

yawaramin · March 19, 2025, 7:22pm

Does the error report need to be absolutely perfect? Here’s a Java example:

// Test.java

class Test {
  int i;

Output:

$ javac Test.java
Test.java:2: error: reached end of file while parsing
  int i;
        ^
1 error

Obviously the error is the missing closing brace. But it just says ‘reached end of file’. But clearly it’s something and at least points you in the right direction.

My point is, is there a way with Menhir’s error reporting to just pick one of the ambiguous options, maybe the first one, and report it as the error? Surely reporting something is better than reporting nothing?

Topic		Replies	Views
Taciturn yet severe compiler talks 'Syntax Error' Learning	2	468	October 25, 2019
Why a handwritten parser? Learning	11	3167	February 16, 2021
Updating the “Error Handling” tutorial Community user-feedback , ocamlorg , tutorial	27	2339	May 19, 2023
Syntax Error #use Ocaml Learning	6	437	February 24, 2024
Syntax Error on "in" Learning	4	1008	June 2, 2023

Can 'Error: Syntax error' be improved?

Related topics