I am confused about the behavior of my Menhir parser. I isolated the confusing behavior to a minimal grammar:
%{
%}
%token DOT
%token <string> IDENT
%start<string list * string> path
%start<string> good
%%
path: list(IDENT DOT { $1 }) IDENT { $1, $2 }
good: IDENT { $1 }
I expect path
to accept strings like IDENT
, IDENT DOT IDENT
, IDENT DOT IDENT DOT IDENT
, etc. However, when running the Menhir interpreter, I get overshoot errors (and in my bigger project, the parse fails where I expect it to succeed due to the productions above).
Menhir gives a warning:
File "/home/REDACTED/.opam/4.07.0/lib/menhir/standard.mly", line 197, characters 16-16:
Warning: production list(__anonymous_0) -> is never reduced.
Warning: in total, 1 production is never reduced.
I don’t understand why. I computed the LR(1) state machine by hand:
GRAMMAR
S': path $
generated: IDENT DOT
list_generated:
|
| generated list_generated
path: list_generated IDENT
ITEM SETS
s0:
S': . path $ { } SHIFT s1
+ path: . list_generated IDENT { $ } GOTO s2
+ list_generated: . { IDENT } REDUCE s0
+ list_generated: . generated list_generated { IDENT } GOTO s3
+ generated: . IDENT DOT { IDENT } SHIFT s4
s1:
S': path . $ { } SHIFT s5
s2:
path: list_generated . IDENT { $ } SHIFT s6
s3:
list_generated: generated . list_generated { IDENT } GOTO s7
+ list_generated: . { IDENT } REDUCE s0
+ list_generated: . generated list_generated { IDENT } GOTO s3
+ generated: . IDENT DOT { IDENT } SHIFT s9
s4:
generated: IDENT . DOT { IDENT } SHIFT s8
s5:
S': path $ . { } ACCEPT
s6:
path: list_generated IDENT . { $ } REDUCE s0
s7:
list_generated: generated list_generated . { IDENT } REDUCE s0
s8:
generated: IDENT DOT . { IDENT } REDUCE s0
s9:
generated: IDENT . DOT { IDENT } SHIFT s10
s10:
generated: IDENT DOT . { IDENT } REDUCE s3
What am I doing wrong? Did I make a mistake in my grammar that I’m stupidly overlooking? Does Menhir have different semantics than traditional LR(1)?
EDIT: Perhaps it’s due to a shift/reduce conflict with IDENT? I just realized that this might be it, so I’m going to investigate more.
EDIT 2: The left-recursive
a:
| IDENT { $1 }
| a DOT IDENT { $1 }
also fails. Unfortunately, I don’t have the time to make state machines by hand for everything.