Optional trailing commas (or other symbols) are quite a common feature in various programming languages. For example, Rust allows you to add a trailing comma at the end of an array like this ( Array and index expressions - The Rust Reference (rust-lang.org)):
I was working on supporting such a feature in a parser application but ran into problems regarding shift/reduce conflicts.
The first approach I tried is this:
%{
%}
%token LBRACK RBRACK COMMA
%token <int> VAR
%start<int list> bracket_list
%%
bracket_list:
LBRACK l = separated_list(COMMA, VAR) option(COMMA) RBRACK { l }
And I got conflict warnings:
** Conflict (shift/reduce) in state 2.
** Token involved: COMMA
** This state is reached from bracket_list after reading:
LBRACK VAR
** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)
bracket_list
(?)
** In state 2, looking ahead at COMMA, reducing production
** separated_nonempty_list(COMMA,VAR) -> VAR
** is permitted because of the following sub-derivation:
LBRACK loption(separated_nonempty_list(COMMA,VAR)) option(COMMA) RBRACK // lookahead token appears because option(COMMA) can begin with COMMA
separated_nonempty_list(COMMA,VAR) // lookahead token is inherited
VAR .
** In state 2, looking ahead at COMMA, shifting is permitted
** because of the following sub-derivation:
LBRACK loption(separated_nonempty_list(COMMA,VAR)) option(COMMA) RBRACK
separated_nonempty_list(COMMA,VAR)
VAR . COMMA separated_nonempty_list(COMMA,VAR)
My understanding is that the parser got confused because with a token VAR, it can reduce right in place, ignoring the option(COMMA) at the end. But with the token COMMA it can also shift to the next part of option(COMMA).
I then tried to specify the precedence of the token COMMA and the production so that shifting would be prioritized:
%{
%}
%token LOW_PREC
%nonassoc LOW_PREC
%token LBRACK RBRACK COMMA
%token <int> VAR
%nonassoc COMMA // higher precedence for the comma
%start<int list> bracket_list
%%
bracket_list:
LBRACK l = separated_list(COMMA, VAR) RBRACK { l } %prec LOW_PREC // lower precedence on this branch
| LBRACK l = separated_list(COMMA, VAR) COMMA RBRACK{ l }
But I got similar error messages. In the end, I tried to create my own version of separated_list with support for trailing commas.
bracket_list:
LBRACK l = separated_nonempty_list_option_trailing(COMMA, VAR) RBRACK { l }
separated_nonempty_list_option_trailing(separator, X):
| x = X
{ [x] }
| x = X separator xs = separated_nonempty_list(separator, X);
{ [x] @ xs }
| x = X separator { [x] }
This time, the error is finally resolved.
Here are my questions:
- Why did the second approach with the specified precedence not work? Is it because the parser cannot determine whether the token
COMMAbelongs tooption(COMMA)or the separated list? - I personally prefer to use more functions/macros from the standard library, so I wonder if there is any way to support trailing commas with
optionandseparated_list. - Why didn’t my third approach run into a problem? It seems to me that the parser will also get confused after reading in the token
Xand pointing at the tokenseparatorsince it does not know whether to reduce right in place according to the first production (x=X), or shift to read in tokenseparatorso that the third production could be used (x=X separator).
