Back in the day I was helped with implementing an include statement in my custom menhir-based language, and this worked fine:
http://git.annexia.org/?p=goals.git;a=blob;f=src/parser.mly;h=9b1a91247767be286ec48f691b1c54bca8432837;hb=HEAD#l24
However with the “new code generator” this no longer works:
File "src/parser.mly", line 102, characters 33-37:
Error: Unbound value file
Apparently you can no longer pass a non-terminal to a function.
Is there now a way to implement include functionality in menhir parsers?
According to the documentation this diff should be enough:
diff --git a/Makefile.am b/Makefile.am
index d57d376..be47db4 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -34,7 +34,10 @@ maintainer-srpm maintainer-fedora-copr: src/goals
# goals itself (see Goalfile.in).
src/goals:
- $(MENHIR) --explain --code-ancient src/parser.mly
+ $(OCAMLFIND) opt $(OCAMLFLAGS) $(OCAMLPACKAGES) -I . -I src src/ast.mli -c -o src/ast.cmi
+ $(OCAMLFIND) opt $(OCAMLFLAGS) $(OCAMLPACKAGES) -I . -I src src/utils.mli -c -o src/utils.cmi
+ $(OCAMLFIND) opt $(OCAMLFLAGS) $(OCAMLPACKAGES) -I . -I src src/cmdline.mli -c -o src/cmdline.cmi
+ cd src && $(MENHIR) --explain --infer parser.mly
# Hack required to break circular dependencies.
echo 'val lexer_read : (Lexing.lexbuf -> token) option ref' >> src/parser.mli
echo 'val eval_substitute : (Ast.env -> Ast.loc -> Ast.substs -> string) option ref' >> src/parser.mli
diff --git a/src/parser.mly b/src/parser.mly
index 9b1a912..ab29f6e 100644
--- a/src/parser.mly
+++ b/src/parser.mly
@@ -86,7 +86,7 @@ let do_include env loc filename optflag file =
%token PREDICATE
(* Start nonterminals. *)
-%start <Ast.expr Ast.Env.t> file
+%type <Ast.expr Ast.Env.t> file
%start <Ast.expr> expr_only
%%
However it seems that using %type
instead of %start
removes it from parser.mli
and thus fails later after menhir is called. This seems contrary to the documentation (Menhir Reference Manual (version 20230608)) which states:
The types provided as part of %type declarations are copied verbatim to the .ml and .mli files.
cc @fpottier
That nearly works if I add both %type
and %start
statements, but:
File "src/parser.mly", line 90, characters 28-32:
File "src/parser.mly", line 89, characters 27-31:
Error: there are multiple %type declarations for the symbol file.
I think that it is only indirectly related to the “new code generator”: when a first version of the code is generated by --infer
to be typed by ocamlc, non-terminals are not defined as functions yet, and there is no mutual recursion. This behavior predates the new code generator; what the new code generator does is that --infer
is now mandatory if we don’t want to give explicit %type
s to all non terminals.
I can suggest two solutions:
-
you may declare a let file_ref = ref None
in the beginning of parser.mly
, in the same vein than lexer_read
and eval_substitute
: you can then refer to Option.get !file_ref
in semantic actions, and you just have to take care of assigning file_ref := Some file
before parsing.
-
you may just add let file _ = failwith "not yet available"
at the beginning of parser.mly
: this declaration will be enough for the type inference to succeed, and this declaration will be shadowed by the let rec
introduced by menhir in the final code, so the semantic action will still refer to the expected definition of file
.
It compiles, but the failwith
is raised when I run the actual program 
Sorry, the second suggestion does not work with the new code generator indeed, which puts the code of semantic actions before defining the functions associated to the non-terminals. You may still try the first solution (introduce a file_ref
)…
If a semantic action needs to recursively call the parser, then a clean (?) solution would be to make the parser a functor, parameterized by the parser itself (use %parameter
to do this); then use module rec
to tie the recursive knot. Caveat: I have not tried it…