[ANN] New release of Menhir (20211125)

I am pleased to announce a new release of Menhir, with an exciting
contribution by Frédéric Bour: a groundbreaking performance improvement in
menhir --list-errors. This is made possible by an entirely new reachability
algorithm, which has been designed and implemented by Frédéric, and which is
described in our paper “Faster Reachability Analysis for LR(1) Parsers”. This
is the link to the paper:

http://cambium.inria.fr/~fpottier/publis/bour-pottier-reachability.pdf

To install the new release, just type

  opam update
  opam install menhir.20211125

Enjoy!


François Pottier
Francois.Pottier@inria.fr
http://cambium.inria.fr/~fpottier/

  • The command menhir --list-errors has been sped up by a factor of up
    to x100, and requires up to x1000 less memory, thanks to a new LR(1)
    reachability algorithm, which has been designed and implemented by
    Frédéric Bour.

  • Better document the restricted way in which the error token must be
    used when using --strategy simplified. Menhir now checks that this
    token is used only at the end of a production, and warns if this is
    not the case. (Better yet, our suggestion is to not use the error
    token at all!)

  • The $syntaxerror keyword is now forbidden when using
    --strategy simplified. This keyword will be entirely removed
    in the next release. Incidentally, we have just found out that
    it behaves differently under the code back-end and under the
    table back-end.

  • Disable OCaml warning 39 (unused rec flag) in the OCaml code produced
    by Menhir’s code back-end. This does not affect the table back-end.
    (Reported by Armaël Guéneau.)

  • Fix a bug in --random-* which could cause Menhir to diverge if the
    grammar uses the error token.

  • Warn if a terminal symbol is named Error. This creates a name clash
    in the public interface of the generated parser.

  • Menhir now requires OCaml 4.03.0 (instead of 4.02.3)
    and Dune 2.8.0 (instead of 2.0.0).

16 Likes

It’s not often you see sentences like “improves performances by 100x”, in particular in fields like automata theory, and I’m still immensely impressed by it. :smiley:

3 Likes

Let me start with a thank you for the new version of menhir, x100 speedup and x1000 less memory wow.
I look forward to using this new version.

Perhaps this isn’t the place to report errors, however I’m seeing this compilation issue with 4.08 - 4.10 using the latest menhir with atdgen. Opam-CI

### output ###
#       ocamlc atd/src/.atd.objs/byte/atd__Parser.{cmo,cmt} (exit 2)
# (cd _build/default && /home/opam/.opam/4.08/bin/ocamlc.opt -w -40 -w -27 -safe-string -g -bin-annot -I atd/src/.atd.objs/byte -I /home/opam/.opam/4.08/lib/easy-format -I /home/opam/.opam/4.08/lib/re -I /home/opam/.opam/4.08/lib/seq -intf-suffix .ml -no-alias-deps -open Atd -o atd/src/.atd.objs/byte/atd__Parser.cmo -c -impl atd/src/parser.ml)
# File "atd/src/parser.ml", line 157, characters 2-546:
# 157 | ..fun _menhir_env _menhir_stack ->
# 158 |     let (_menhir_env : _menhir_env) = _menhir_env in
# 159 |     let (_menhir_stack : ('freshtv431 * _menhir_state * Lexing.position) * _menhir_state * 'tv_variant_list) = Obj.magic _menhir_stack in
# 160 |     let (_endpos : Lexing.position) = _menhir_env._menhir_lexbuf.Lexing.lex_curr_p in
# 161 |     let (_startpos : Lexing.position) = _menhir_env._menhir_lexbuf.Lexing.lex_start_p in
# 162 |     let _menhir_env = _menhir_discard _menhir_env in
# 163 |     (_menhir_reduce49 _menhir_env (Obj.magic _menhir_stack) _endpos _startpos : 'freshtv432)
# Error: This definition has type
#          'ttv_tail.
#            _menhir_env ->
#            ('ttv_tail * _menhir_state * Lexing.position) * _menhir_state *
#            'tv_variant_list -> 'freshtv432
#        which is less general than
#          'ttv_tail 'ttv_return.
#            _menhir_env ->
#            ('ttv_tail * _menhir_state * Lexing.position) * _menhir_state *
#            'tv_variant_list -> 'ttv_return
#     ocamlopt atd/src/.atd.objs/native/atd__Parser.{cmx,o} (exit 2)
# (cd _build/default && /home/opam/.opam/4.08/bin/ocamlopt.opt -w -40 -w -27 -safe-string -g -I atd/src/.atd.objs/byte -I atd/src/.atd.objs/native -I /home/opam/.opam/4.08/lib/easy-format -I /home/opam/.opam/4.08/lib/re -I /home/opam/.opam/4.08/lib/seq -intf-suffix .ml -no-alias-deps -open Atd -o atd/src/.atd.objs/native/atd__Parser.cmx -c -impl atd/src/parser.ml)
# File "atd/src/parser.ml", line 157, characters 2-546:
# 157 | ..fun _menhir_env _menhir_stack ->
# 158 |     let (_menhir_env : _menhir_env) = _menhir_env in
# 159 |     let (_menhir_stack : ('freshtv431 * _menhir_state * Lexing.position) * _menhir_state * 'tv_variant_list) = Obj.magic _menhir_stack in
# 160 |     let (_endpos : Lexing.position) = _menhir_env._menhir_lexbuf.Lexing.lex_curr_p in
# 161 |     let (_startpos : Lexing.position) = _menhir_env._menhir_lexbuf.Lexing.lex_start_p in
# 162 |     let _menhir_env = _menhir_discard _menhir_env in
# 163 |     (_menhir_reduce49 _menhir_env (Obj.magic _menhir_stack) _endpos _startpos : 'freshtv432)
# Error: This definition has type
#          'ttv_tail.
#            _menhir_env ->
#            ('ttv_tail * _menhir_state * Lexing.position) * _menhir_state *
#            'tv_variant_list -> 'freshtv432
#        which is less general than
#          'ttv_tail 'ttv_return.
#            _menhir_env ->
#            ('ttv_tail * _menhir_state * Lexing.position) * _menhir_state *
#            'tv_variant_list -> 'ttv_return

In-case someone turns up here because of the build issue.
An issue was created and a fix documented