Writing a parser for Pascal parse tree dump files, which can contain tree structures with multiple children like:
(node A
(node B)
(node C)
(node D
(node F)
)
)
(where B, C, and D are children of A, and F is a child of D). I’ve got a conflict-free grammar written but parsing child nodes just refuses to work, always erroring on the closing parenthesis of the child. For context, nodes can optionally have some data, and optionally have children. Here’s the core subset of my grammar:
%start main
%type <Vp_lifter.Parse_tree.parse_tree_node list> main node_list
%type <Vp_lifter.Parse_tree.parse_tree_node> node
%type <Vp_lifter.Parse_tree.parse_tree_node_type> node_type
%type <string> resultdef
%type <Vp_lifter.Parse_tree.flag list> flags flags_list
%type <(string * Vp_lifter.Parse_tree.pt_vtype) list> data_list
%type <(string * Vp_lifter.Parse_tree.pt_vtype)> data
%type <unit> eq_col
%%
main :
EOF { [] }
| node { [$1] }
node_list : { [] }
| node node_list { $1 :: $2 }
node :
LEFT_PARENTHESIS
node_type COMMA
resultdef COMMA
POS EQUALS LEFT_PARENTHESIS NUMBER COMMA NUMBER RIGHT_PARENTHESIS COMMA
LOC EQUALS IDENTIFIER COMMA
EXPECTLOC EQUALS IDENTIFIER COMMA
FLAGS EQUALS flags COMMA
CMPLX EQUALS NUMBER
data_list
node_list
RIGHT_PARENTHESIS
{
{
pt_type = $2;
resultdef = return_type_of_string $4;
pos = ($9, $11);
loc = loc_of_string $16;
expectloc = loc_of_string $20;
flags = $24;
cmplx = $28;
data = $29;
children = $30;
}
}
data_list : { [] }
| NIL data_list { $2 }
| data data_list { $1 :: $2 }
eq_col : EQUALS { () } | COLON { () }
data :
IDENTIFIER eq_col IDENTIFIER { ($1, String $3) }
| IDENTIFIER EQUALS NUMBER { ($1, Integer $3) }
and some sample input:
(blockn, resultdef = $void = "untyped", pos = (3,6), loc = LOC_INVALID, expectloc = LOC_INVALID, flags = [], cmplx = 0
(loadn, resultdef = LongWord = "DWord", pos = (7,5), loc = LOC_INVALID, expectloc = LOC_INVALID, flags = [nf_write], cmplx = 1
nil
symbol = X
) <-- 'unexpected token ")"' error here
)
Here, nil
is a data, as well as symbol = X
. Children always come after data in the dump files.
Just hoping to get a sanity check here. It seems like I’ve implemented optional children + data properly but I’ve been spinning my wheels for a while now.