Dear OCaml & Menhir users,
I am pleased to announce a new release of Menhir, with a major improvement.
The code back-end has been rewritten from the ground up by Émile Trotignon
and by myself, and now produces efficient and well-typed OCaml code. The
infamous Obj.magic is not used any more.
Furthermore, the new code back-end produces code that is more aggressively
optimized, leading to a significant reduction in memory allocation and a
typical performance improvement of up to 20% compared to the previous code
back-end.
opam update
opam install menhir.20211230
Happy well-typed parsing in 2022!
2021/12/30
-
The code back-end has been rewritten from the ground up by Émile Trotignon
and François Pottier, and now produces efficient and well-typed OCaml
code. The infamousObj.magicis not used any more.The table back-end and the Coq back-end are unaffected by this change.
The main side effects of this change are as follows:
-
The code back-end now needs type information. This means that
either Menhir’s type inference mechanism must be enabled
(the easiest way of enabling it is to use Menhir viadune
and to check that thedune-projectfile says
(using menhir 2.0)or later)
or the type of every nonterminal symbol must be
explicitly given via a%typedeclaration. -
The code back-end no longer allows the type of any symbol to be an
open polymorphic variant type, such as[> `A ]. As a workaround,
we suggest using a closed polymorphic variant instead. -
The code back-end now adheres to the simplified error-handling strategy,
as opposed to the legacy strategy.For grammars that do not use the
errortoken, this makes no difference.For grammars that use the
errortoken in the limited way permitted by
the simplified strategy, this makes no difference either. The simplified
strategy makes the following requirement: theerrortoken should always
appear at the end of a production, whose semantic action should abort the
parser by raising an exception.Grammars that make more complex use of the
errortoken, and therefore
need thelegacystrategy, cannot be compiled by the new code back-end.
As a workaround, it is possible to switch to the table back-end (using
--table --strategy legacy) or to the ancient code back-end (using
--code-ancient). In the long run, we recommend abandoning the use of
theerrortoken. Support for theerrortoken may be removed
entirely at some point in the future.
The original code back-end, which has been around since the early days of
Menhir (2005), temporarily remains available (using--code-ancient). It
will be removed at some point in the future.The new code back-end offers several levels of optimization, which remain
undocumented and are subject to change in the future. At present, the main
levels are roughly as follows:-O 0 --represent-everythinguses a uniform representation of the stack
and produces straightforward code.-O 0uses a non-uniform representation of the stack; some stack cells
have fewer fields; some stack cells disappear altogether.-O 1reduces memory traffic by movingPUSHoperations so that they
meetPOPoperations and cancel out.-O 2optimizes the reduction of unit productions (that is, productions
whose right-hand side has length 1) by performing a limited amount of
code specialization.
The default level of optimization is the maximum level,
-O 2. -
-
The new command line switch
--exn-carries-statecauses the exception
Errorto carry an integer parameter:exception Error of int. When the
parser detects a syntax error, the number of the current state is reported
in this way. This allows the caller to select a suitable syntax error
message, along the lines described in
Section 11
of the manual. This command line switch is currently supported by the code
back-end only. -
The
$syntaxerrorkeyword is no longer supported. -
Document the trick of wrapping module aliases in
open struct ... end,
like this:%{ open struct module alias M = MyLongModuleName end %}.
This allows you to use the short nameMin your grammar, but forces
OCaml to infer types that refer to the long nameMyLongModuleName.
(Suggested by Frédéric Bour.)