Menhir : production nonempty_list(STRING) -> STRING is never reduced

Hi,

I am trying to implement a parser for entries (entries are similar to terminal entries) that transform an entry into a list of string.

For instance, transform “git commit -m \”first commit\”” into [“git”; “commit”; “-m”; “\”first commit\””]

I have implemented a parser and a lexer but for a reason I do not understand, it fails on sample input.

My parser is :

%token <string> STRING
%token EOF
%start <Entry.value option> prog
%type <string list> list_tokens
%%

prog:
  | v = list_tokens { Some v }
  | EOF       { None   } ;

list_tokens:
  | vl = nonempty_list(STRING) { vl }

when I try the command menhir entry_tokenizer/parser.mly,

I get the warning :

...
Warning: production nonempty_list(STRING) -> STRING is never reduced.
Warning: in total, 1 production is never reduced.
...
The types of the following nonterminal symbols are unknown:
nonempty_list(STRING)

I do not get what is wrong with this parser since the rule “vl = nonempty_list(STRING) { vl }” is unambiguous in my opinion ?

The code with the full example and test is at GitHub - korkorran/entry_tokenizer

After some tests, I finally fixed it :
the first rule for ‘prog’ contains an EOF token at the end :

prog:
  | v = list_tokens; EOF { Some v }
  | EOF       { None   } ;

It fixed my case.

thanks for your reading

I don’t know about Menhir specifically but is in

prog:
  | v = list_tokens; EOF { Some v }
  | EOF       { None   } ;

an implicit empty case between : and |? I would expect | to separate productions and in that case you have three and not two, as you might assume. For example, I am writing my productions for ocamlyacc like this:

main:
      expr EOL              { Ast.Expr $1 }
    | stmt EOL              { match $1 with
                              id,expr -> Ast.Define (id, expr)
                            }
;

It is not a question of ambiguity. If your grammar is given by

prog:
| nonempty_list(STRING)
| EOF

then it accepts the language

EOF
STRING
STRING STRING
STRING STRING STRING
etc

After seeing a STRING, the automaton has to decide whether to request more input in the hopes of shifting a new STRING into the stack or reduce the top STRING via nonempty_list(STRING) -> STRING. However, Menhir (and OCamlYacc) always choose to request more input in this case so the reduction is never used (and so parsing will fail when the input runs out). See “end-of-stream conflicts” in the Menhir manual (section 6.4):

By adding an explicit EOF token:

prog:
| nonempty_list(STRING) EOF
| EOF

the automaton will reduce nonempty_list(STRING) -> STRING when the lookahead token is EOF, and then keep reducing all the way up to prog.

Makes sense?

Cheers,
Nicolas

1 Like

I don’t think so. As in OCaml, the leading vertical bar is optional. Empty cases are recognized by the presence of a semantic action:

prog:
| /* empty */ { ... }
...

Cheers,
Nicolas

1 Like