Juxtaposition in Menhir

I would like to use “simple” precedence rules to write a parser in Menhir for a grammar where juxtaposition is right-associative function application (like OCaml syntax, so e.g. f g x should be parsed Apply (Apply (Var "f", Var "g"), Var "x")). One might naïvely expect the following to work:

%token <string> VAR
%token EOF

%right APP

%start <Ast.t> main
%%

main : e = expr EOF { e }

expr:
  | e1=expr e2=expr %prec APP
    { Ast.Apply (e1,e2) }
  | x=VAR
    { Ast.Var x }

but alas

$ menhir lib/parser.mly 
File "lib/parser.mly", line 4, characters 0-6:
Warning: the precedence level assigned to APP is never useful.
File "lib/parser.mly", line 12, characters 26-29:
Warning: this %prec declaration is never useful.
Warning: one state has shift/reduce conflicts.
Warning: one shift/reduce conflict was arbitrarily resolved.

I know how to get this to work the hard way, using explicit “levels” of expressions, like this:

%token <string> VAR
%token LPAREN RPAREN
%token EOF

%start <Ast.t> main
%%

main : e = expr EOF { e }

expr:
  | e1=expr e2=expr1
    { Ast.Apply (e1,e2) }
  | e=expr1
    { e }

expr1:
  | x=VAR
    { Ast.Var x }
  | LPAREN e=expr RPAREN
    { e }

So, is there a way to get the same result but using precedence annotations?

Thanks!

This is left-associative. It would be right-asociative if the output was Apply (Var "f", Apply (Var "g", Var "x")).

Sure.

Shift/reduce conflicts occur when encountering a token. In your case, the parser stack already contains two expressions and the next token is VAR, so the parser has to take a decision. Does it ignore the token and reduce the stack using expr expr -> expr? Or does it shift the token (and later reduce it using VAR -> expr)? To decide how to solve this conflict, Menhir compares the precedence of VAR with the precedence of the rule expr expr -> expr.

If you only use %prec APP, then you are not telling Menhir anything, because APP and VAR have uncomparable precedences. If you change %right APP into

%nonassoc VAR
%right APP

then APP gets a higher precedence than VAR, which makes your grammar left-associative. In other words, whether you write %left or %right or %nonassoc here is actually meaningless. The grammar will always be left-associative. And conversely, if you swap the two lines, the grammar will always be right-associative, irrespective of the associativity you specify.

So, if you want the associativity to be taken into account, you need the precedence of the rule expr expr -> expr to be exactly the same as VAR:

%right VAR (* or %left VAR, depending on what you want *)
...
expr:
  | e1=expr e2=expr %prec VAR
  | x=VAR
5 Likes

This is left-associative.

Yes, sorry. Left associativity was my intention here. I have trouble with chirality in the real world too.

To decide how to solve this conflict, Menhir compares the precedence of VAR with the precedence of the rule expr expr -> expr .

This is a helpful and lucid explanation. Thank you very much!

I’m struggling to understand what’s going on here, and have tried to incorporate what is being discussed in several different ways.

But each time, I’ve failed to do so. I’ll produce my code below, which I’ve tried to minimize without significantly changing the behavior.

%{
open Ast
%}

%token <int> INT
%token <string> VAR
%token UNIT

%token OR
%left OR
%left AND
%left VAR UNIT INT
%left APP

%token FUN
%token ARROW
%token IF
%token THEN
%token ELSE

%token EOF

%start <Ast.prog> prog

%%

prog:
  | e = expr; EOF { e }
;

expr:
  | IF; e1 = expr; THEN; e2 = expr; ELSE; e3 = expr { If(e1, e2, e3) }
  | FUN; id = VAR; ARROW; e = expr { Fun(id, e) }
  | e = expr2 { e }

  expr2:
  | e1 = expr2; b = binop; e2 = expr2 { Binop(b, e1, e2) }
  | e1 = expr3 e2 = expr3 %prec APP { App(e1, e2) }
  | e = expr3 { e }

expr3:
  | UNIT { Unit }
  | i = INT { Int i }
  | s = VAR { Var s }
  | LPAREN; e = expr; RPAREN { e }

%inline binop: 
  | AND { And }
  | OR { Or }

As with ghulette’s code, I’m getting a warning that the %prec is never used.

I did my best to at least half-understand the explanation about why this happens, but I’m not sure exactly what it means that the conflict occurs “when encountering a token” – when encountering it in an input string? Or in one of the %token declarations? So my apologies, but I’m new to all of this and quite lost.

I think I understand that the fundamental issue is whether something like “x y z” is understood by the parser to be read first as expr (expr expr) or shift this to read (expr expr) expr. And if I understand the “parser stack” idea correctly, for the first one it adds “x” to the parser stack and then matches “x y” as expr expr, and therefore reads “x y z” as “x (y z)”. The alternative is to not put “x” on the parser stack, and this involves a kind of shift, which in turn leads to a shift in the parentheses.

I don’t completely understand how two things can have uncomparable precedences – is that because of how they were declared as tokens with things like %left? Or because of how they are structured in the grammar production rules?

Well, I’ve probably written too much as it is. In short, if anyone could advise about how to make e1 = expr3 e2 = expr3 %prec APP { App(e1, e2) } work, I’d appreciate it!

As it is now, it seems to give application right-associativity.

Thank you.

Yes, it is a token in the input stream. Menhir is (kind of) a LR(k) parser with k = 1, which means that it can read at most one token from the input stream before taking a decision. There are two kinds of decisions: either push this token on the top of the stack (a so-called shift), or pop some previously pushed values from the top of the stack, combine them according to a rule, and push the resulting value on the stack (a so-called reduce). At the end of the execution, there should be only one single value left in the stack, which is the whole parsed text.

No. Menhir first reads VAR(x) and pushes it onto the stack. It then reads VAR(y) and it has to take a decision: Does it reduces VAR(x) to expr3(x) on the stack? Let us suppose it does. (Actually, it has no choice here. Because if it does not, it will push VAR(y) onto VAR(x) which will never be popped back later on, hence preventing a successful parsing.) The parser is still in the process of reading VAR(y) and it has to take yet another decision: Does it reduce expr3(x) to expr2(x) on the stack? This time, it is a real decision. In one case, the stack will contain expr2(x) (with VAR(y) still being read), and in the other case, the stack will contain expr3(x); VAR(y). Both are perfectly valid stacks. This is a shift/reduce conflict: Menhir can either shift VAR(y) or reduce expr3(x).

That is where precedence comes into play. Menhir compares the precedence of VAR (the token being read) with expr2: e = expr3 (the considered reduction rule). The latter has no associated precedence, so Menhir does not know how to solve the conflict. By the way, notice how we never even reached the point where the rule expr2: e1 = expr3 e2 = expr3 %prec APP matters; we did not even read z on the input yet.

So, to fix the conflict, you need to set the precedence of the rule expr2: e = expr3, so that it can be compared to the precedence of VAR. See my previous post for more details on that.

I appreciate the help :grinning_face:. However, I’ve now tried about twenty other variations on placing directives in different places and seem to still not be getting it.

I don’t fully understand what these directives actually do. The only high-level explanations that I’ve been able to understand (i.e. not mentioning the ones I just don’t understand) is that things lower in the list of tokens are higher precedence. And things declared %left will break ties using left-associativity.

I believe I now understand the logic of the parser better, so thank you for that. But here was my best guess about how to implement the logic that you describe. I might still misunderstand the logic. But I suspect that really just don’t know which directives will implement the logic.

Anyway, among many other guesses, this is the one I thought most closely followed what you suggested above, but clearly it’s wrong.

%{
open Ast
%}

%token APP

%token <int> INT
%token <string> VAR
%token UNIT

%left VAR UNIT INT

%token OR
%left OR
%left AND
%token FUN
%token ARROW
%token IF
%token THEN
%token ELSE
%token EOF

%start <Ast.prog> prog

%%

prog:
  | e = expr; EOF { e }
;

expr:
  | IF; e1 = expr; THEN; e2 = expr; ELSE; e3 = expr { If(e1, e2, e3) }
  | FUN; id = VAR; ARROW; e = expr { Fun(id, e) }
  | e = expr2 { e }

expr2:
  | e1 = expr2; b = binop; e2 = expr2 { Binop(b, e1, e2) }
  | e1 = expr3 e2 = expr3 { App(e1, e2) }
  | e = expr3 { e } %prec VAR

expr3:
  | UNIT { Unit }
  | i = INT { Int i }
  | s = VAR { Var s }
  | LPAREN; e = expr; RPAREN { e }

%inline binop: 
  | AND { And }
  | OR { Or }

My explanations are only meaningful if Menhir complains that there is a conflict. Now that I look more closely at your grammar, it seems to be unambiguous. So, there is no conflict to solve.

In particular, I was wrong when I said that both [expr2(x)] . VAR(y) and [expr3(x); VAR(y)] . ? are valid states. Indeed, the leftmost case can never lead to a successful parse, according to your grammar. So, Menhir has no other choice than to shift here.

For the same reason, an input x y z (that is, VAR VAR VAR EOF, which you can try yourself by passing option --interpret to Menhir) can never be successfully parsed. To do so, your grammar needs a rule that accepts expr2 expr3, for example, but your grammar only accepts expr3 expr3.

1 Like

DOH! Thank you very much for your help, I have it working now.