[ANN] New release of Menhir (20211230)

fpottier · December 31, 2021, 8:26am

Dear OCaml & Menhir users,

I am pleased to announce a new release of Menhir, with a major improvement.

The code back-end has been rewritten from the ground up by Émile Trotignon
and by myself, and now produces efficient and well-typed OCaml code. The
infamous Obj.magic is not used any more.

Furthermore, the new code back-end produces code that is more aggressively
optimized, leading to a significant reduction in memory allocation and a
typical performance improvement of up to 20% compared to the previous code
back-end.

  opam update
  opam install menhir.20211230

Happy well-typed parsing in 2022!

2021/12/30

The code back-end has been rewritten from the ground up by Émile Trotignon
and François Pottier, and now produces efficient and well-typed OCaml
code. The infamous Obj.magic is not used any more.

The table back-end and the Coq back-end are unaffected by this change.

The main side effects of this change are as follows:
- The code back-end now needs type information. This means that
  either Menhir’s type inference mechanism must be enabled
  (the easiest way of enabling it is to use Menhir via dune
  and to check that the dune-project file says
  (using menhir 2.0) or later)
  or the type of every nonterminal symbol must be
  explicitly given via a %type declaration.
- The code back-end no longer allows the type of any symbol to be an
  open polymorphic variant type, such as [> `A ]. As a workaround,
  we suggest using a closed polymorphic variant instead.
- The code back-end now adheres to the simplified error-handling strategy,
  as opposed to the legacy strategy.
  
  For grammars that do not use the error token, this makes no difference.
  
  For grammars that use the error token in the limited way permitted by
  the simplified strategy, this makes no difference either. The simplified
  strategy makes the following requirement: the error token should always
  appear at the end of a production, whose semantic action should abort the
  parser by raising an exception.
  
  Grammars that make more complex use of the error token, and therefore
  need the legacy strategy, cannot be compiled by the new code back-end.
  As a workaround, it is possible to switch to the table back-end (using
  --table --strategy legacy) or to the ancient code back-end (using
  --code-ancient). In the long run, we recommend abandoning the use of
  the error token. Support for the error token may be removed
  entirely at some point in the future.
The original code back-end, which has been around since the early days of
Menhir (2005), temporarily remains available (using --code-ancient). It
will be removed at some point in the future.

The new code back-end offers several levels of optimization, which remain
undocumented and are subject to change in the future. At present, the main
levels are roughly as follows:
- -O 0 --represent-everything uses a uniform representation of the stack
  and produces straightforward code.
- -O 0 uses a non-uniform representation of the stack; some stack cells
  have fewer fields; some stack cells disappear altogether.
- -O 1 reduces memory traffic by moving PUSH operations so that they
  meet POP operations and cancel out.
- -O 2 optimizes the reduction of unit productions (that is, productions
  whose right-hand side has length 1) by performing a limited amount of
  code specialization.
The default level of optimization is the maximum level, -O 2.
The new command line switch --exn-carries-state causes the exception
Error to carry an integer parameter: exception Error of int. When the
parser detects a syntax error, the number of the current state is reported
in this way. This allows the caller to select a suitable syntax error
message, along the lines described in
Section 11
of the manual. This command line switch is currently supported by the code
back-end only.
The $syntaxerror keyword is no longer supported.
Document the trick of wrapping module aliases in open struct ... end,
like this: %{ open struct module alias M = MyLongModuleName end %}.
This allows you to use the short name M in your grammar, but forces
OCaml to infer types that refer to the long name MyLongModuleName.
(Suggested by Frédéric Bour.)

Topic		Replies	Views
[ANN] New release of Menhir (20211125) Community announce	3	1272	November 29, 2021
[ANN] New release of Menhir (20201201) Community announce	0	810	December 2, 2020
[ANN] New release of Menhir (20230608) Community announce	0	487	June 14, 2023
[ANN] New release of Menhir (20230415) Ecosystem announce	0	491	April 19, 2023
[ANN] New release of Menhir (20231231) Community announce	0	365	January 8, 2024

[ANN] New release of Menhir (20211230)

2021/12/30

Related topics