How to write lexer rules correctly

mkp · July 18, 2021, 5:15pm

I’m learning how to write a parser, and I was following the calc.ml example from Chapter 13 of the manual (OCaml - Lexer and parser generators (ocamllex, ocamlyacc)). The program compiles and runs as expected.

When input contains a a character which is not listed in the lexer rules (semicolon in “2;” below) the program fails:

$ bash calc.sh
11 states, 267 transitions, table size 1134 bytes
2*2
4
2;
Fatal error: exception Failure("lexing: empty token")

What is the idiomatic way of handling this?

I want the program to print an error message which contains invalid token and its position in the input, and not terminate.

Should the surrounding code simply handle the Failure exception?
Should the lexer include a catch-all rule for “invalid token” type, and let parser produce an error message instead?
Can you recommend a source file from a real application or library, that would be a good example to read and follow?

nojb · July 18, 2021, 6:01pm

The best way is to add a “catch all” _ rule and handle it any way you want.

In the action corresponding to the _ pattern you have access to the current lexbuf from which you can extract the current position and use that to report a nice error message.

In general handling lexing errors in the parser seems a bit backwards. What I would do is to raise an ad-hoc exception with the error message information and handle that exception in the code that calls into the parser & lexer.

One example is in the compiler source itself:

github.com

ocaml/ocaml/blob/f203a5d45dd32216596d3de1f5fd02cd8cab6f7b/parsing/lexer.mll#L585-L586


      
          | (_ as illegal_char)
              { error lexbuf (Illegal_character illegal_char) }

Cheers,
Nicolas

Topic		Replies	Views
What does the error 'Exception: Failure "lexing: empty token".' mean in OCamlex? Learning	4	3095	July 5, 2023
How do I keep asking for characters for my lexer? Learning	3	613	November 9, 2019
Bug with Ocamllex Ecosystem ocamllex	8	599	November 14, 2023
How does one specify types of functions for lexers in ocamlex? Learning	1	515	November 18, 2019
What is the eof token in OCaml/OCamlex? Learning	1	1721	November 9, 2019

How to write lexer rules correctly

Related topics