Dear OCaml & Menhir users,
I am pleased to announce a new release of Menhir, with a major improvement.
The code back-end has been rewritten from the ground up by Émile Trotignon
and by myself, and now produces efficient and well-typed OCaml code. The
infamous Obj.magic is not used any more.
Furthermore, the new code back-end produces code that is more aggressively
optimized, leading to a significant reduction in memory allocation and a
typical performance improvement of up to 20% compared to the previous code
back-end.
opam update
opam install menhir.20211230
Happy well-typed parsing in 2022!
2021/12/30
-
The code back-end has been rewritten from the ground up by Émile Trotignon
and François Pottier, and now produces efficient and well-typed OCaml
code. The infamousObj.magic
is not used any more.The table back-end and the Coq back-end are unaffected by this change.
The main side effects of this change are as follows:
-
The code back-end now needs type information. This means that
either Menhir’s type inference mechanism must be enabled
(the easiest way of enabling it is to use Menhir viadune
and to check that thedune-project
file says
(using menhir 2.0)
or later)
or the type of every nonterminal symbol must be
explicitly given via a%type
declaration. -
The code back-end no longer allows the type of any symbol to be an
open polymorphic variant type, such as[> `A ]
. As a workaround,
we suggest using a closed polymorphic variant instead. -
The code back-end now adheres to the simplified error-handling strategy,
as opposed to the legacy strategy.For grammars that do not use the
error
token, this makes no difference.For grammars that use the
error
token in the limited way permitted by
the simplified strategy, this makes no difference either. The simplified
strategy makes the following requirement: theerror
token should always
appear at the end of a production, whose semantic action should abort the
parser by raising an exception.Grammars that make more complex use of the
error
token, and therefore
need thelegacy
strategy, cannot be compiled by the new code back-end.
As a workaround, it is possible to switch to the table back-end (using
--table --strategy legacy
) or to the ancient code back-end (using
--code-ancient
). In the long run, we recommend abandoning the use of
theerror
token. Support for theerror
token may be removed
entirely at some point in the future.
The original code back-end, which has been around since the early days of
Menhir (2005), temporarily remains available (using--code-ancient
). It
will be removed at some point in the future.The new code back-end offers several levels of optimization, which remain
undocumented and are subject to change in the future. At present, the main
levels are roughly as follows:-
-O 0 --represent-everything
uses a uniform representation of the stack
and produces straightforward code. -
-O 0
uses a non-uniform representation of the stack; some stack cells
have fewer fields; some stack cells disappear altogether. -
-O 1
reduces memory traffic by movingPUSH
operations so that they
meetPOP
operations and cancel out. -
-O 2
optimizes the reduction of unit productions (that is, productions
whose right-hand side has length 1) by performing a limited amount of
code specialization.
The default level of optimization is the maximum level,
-O 2
. -
-
The new command line switch
--exn-carries-state
causes the exception
Error
to carry an integer parameter:exception Error of int
. When the
parser detects a syntax error, the number of the current state is reported
in this way. This allows the caller to select a suitable syntax error
message, along the lines described in
Section 11
of the manual. This command line switch is currently supported by the code
back-end only. -
The
$syntaxerror
keyword is no longer supported. -
Document the trick of wrapping module aliases in
open struct ... end
,
like this:%{ open struct module alias M = MyLongModuleName end %}
.
This allows you to use the short nameM
in your grammar, but forces
OCaml to infer types that refer to the long nameMyLongModuleName
.
(Suggested by Frédéric Bour.)