How to write lexer rules correctly

I’m learning how to write a parser, and I was following the calc.ml example from Chapter 13 of the manual (OCaml - Lexer and parser generators (ocamllex, ocamlyacc)). The program compiles and runs as expected.

When input contains a a character which is not listed in the lexer rules (semicolon in “2;” below) the program fails:

$ bash calc.sh
11 states, 267 transitions, table size 1134 bytes
2*2
4
2;
Fatal error: exception Failure("lexing: empty token")

What is the idiomatic way of handling this?

I want the program to print an error message which contains invalid token and its position in the input, and not terminate.

Should the surrounding code simply handle the Failure exception?
Should the lexer include a catch-all rule for “invalid token” type, and let parser produce an error message instead?
Can you recommend a source file from a real application or library, that would be a good example to read and follow?

The best way is to add a “catch all” _ rule and handle it any way you want.

In the action corresponding to the _ pattern you have access to the current lexbuf from which you can extract the current position and use that to report a nice error message.

In general handling lexing errors in the parser seems a bit backwards. What I would do is to raise an ad-hoc exception with the error message information and handle that exception in the code that calls into the parser & lexer.

One example is in the compiler source itself:

Cheers,
Nicolas