Implementing an include statement in menhir > 20211123

Back in the day I was helped with implementing an include statement in my custom menhir-based language, and this worked fine:

http://git.annexia.org/?p=goals.git;a=blob;f=src/parser.mly;h=9b1a91247767be286ec48f691b1c54bca8432837;hb=HEAD#l24

However with the “new code generator” this no longer works:

File "src/parser.mly", line 102, characters 33-37:
Error: Unbound value file

Apparently you can no longer pass a non-terminal to a function.

Is there now a way to implement include functionality in menhir parsers?

According to the documentation this diff should be enough:

diff --git a/Makefile.am b/Makefile.am
index d57d376..be47db4 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -34,7 +34,10 @@ maintainer-srpm maintainer-fedora-copr: src/goals
 # goals itself (see Goalfile.in).

 src/goals:
-       $(MENHIR) --explain --code-ancient src/parser.mly
+       $(OCAMLFIND) opt $(OCAMLFLAGS) $(OCAMLPACKAGES) -I . -I src src/ast.mli -c -o src/ast.cmi
+       $(OCAMLFIND) opt $(OCAMLFLAGS) $(OCAMLPACKAGES) -I . -I src src/utils.mli -c -o src/utils.cmi
+       $(OCAMLFIND) opt $(OCAMLFLAGS) $(OCAMLPACKAGES) -I . -I src src/cmdline.mli -c -o src/cmdline.cmi
+       cd src && $(MENHIR) --explain --infer parser.mly
 # Hack required to break circular dependencies.
        echo 'val lexer_read : (Lexing.lexbuf -> token) option ref' >> src/parser.mli
        echo 'val eval_substitute : (Ast.env -> Ast.loc -> Ast.substs -> string) option ref' >> src/parser.mli
diff --git a/src/parser.mly b/src/parser.mly
index 9b1a912..ab29f6e 100644
--- a/src/parser.mly
+++ b/src/parser.mly
@@ -86,7 +86,7 @@ let do_include env loc filename optflag file =
 %token PREDICATE

 (* Start nonterminals. *)
-%start <Ast.expr Ast.Env.t> file
+%type <Ast.expr Ast.Env.t> file
 %start <Ast.expr> expr_only
 %%

However it seems that using %type instead of %start removes it from parser.mli and thus fails later after menhir is called. This seems contrary to the documentation (Menhir Reference Manual (version 20211230)) which states:

The types provided as part of %type declarations are copied verbatim to the .ml and .mli files.

cc @fpottier

That nearly works if I add both %type and %start statements, but:

File "src/parser.mly", line 90, characters 28-32:
File "src/parser.mly", line 89, characters 27-31:
Error: there are multiple %type declarations for the symbol file.

I think that it is only indirectly related to the “new code generator”: when a first version of the code is generated by --infer to be typed by ocamlc, non-terminals are not defined as functions yet, and there is no mutual recursion. This behavior predates the new code generator; what the new code generator does is that --infer is now mandatory if we don’t want to give explicit %types to all non terminals.

I can suggest two solutions:

  • you may declare a let file_ref = ref None in the beginning of parser.mly, in the same vein than lexer_read and eval_substitute: you can then refer to Option.get !file_ref in semantic actions, and you just have to take care of assigning file_ref := Some file before parsing.

  • you may just add let file _ = failwith "not yet available" at the beginning of parser.mly: this declaration will be enough for the type inference to succeed, and this declaration will be shadowed by the let rec introduced by menhir in the final code, so the semantic action will still refer to the expected definition of file.

It compiles, but the failwith is raised when I run the actual program :frowning:

Sorry, the second suggestion does not work with the new code generator indeed, which puts the code of semantic actions before defining the functions associated to the non-terminals. You may still try the first solution (introduce a file_ref)…

If a semantic action needs to recursively call the parser, then a clean (?) solution would be to make the parser a functor, parameterized by the parser itself (use %parameter to do this); then use module rec to tie the recursive knot. Caveat: I have not tried it…