Need help with a menhir grammar

Chimrod · August 29, 2021, 3:11pm

Hi !

I’m stuck with conflict problem in menhir that I do not know how to solve.

I want to declare an ending sequence after a initial sequence:

%token B
%token W
%token EOL

%start<unit> main

%% 

seq    :                {}
       | seq B          {}
       | seq W          {}
       ;

ending :                {}
       | W W W          {}
       ;

main   : seq ending EOL {}

This grammar accept a sequence like B W B W W W EOL, but fail with B W W B EOL

$ echo "B W W B EOL" | menhir --interpret --interpret-show-cst bw.mly 
Warning: one state has shift/reduce conflicts.
Warning: one shift/reduce conflict was arbitrarily resolved.
Ready!
REJECT

I’ve seen the conflict pointed by the parser, and I understand it, but I do not know how to deal with the priority operators in order to solve it.

Does anybody know a way to make the grammar accept an ending sequence and give the it the priority over the initial sequence ?

Regards

Chet_Murthy · August 29, 2021, 6:24pm

Why should “B W W B” not be accepted? “ending” can parse the empty string, and seq is effectively (“B” | “W”)* isn’t it?

Can you specify more precisely what language you want to parse? [I recognize the contradiction in asking for “more precisely” when, in our field, the way you do that, is to write a grammar *grin*]

Chet_Murthy · August 29, 2021, 9:23pm

The language you’re trying to accept is:

(B|W)*(WWW)?<EOL>

yes?

Chimrod · August 30, 2021, 7:16am

Thanks for taking a look to my problem!

Why should “B W W B” not be accepted?

This what I do not fully understand. This shall be accepted on the paper, but menhir reject this sequence.

Can you specify more precisely what language you want to parse?

I have reduce the problem to a minimal (non-)working example, but my intention is to parse the conjugation of French verbs. To keep it simple, in English with the verb “to walk” you have the stem “walk” and thoses suffixes:

walk-{}
walk-ed
walk-ing
walk-s
…

In French you have much more declinations, and I’m falling in an error like the one presented in the first grammar:

err-e
err-er
err-erez
err-ez
…

The menhir documentation explain how to deal with conflicts, and the use of priority operators in order to tell the parser which rule to apply, but I didn’t find a way to make it accept my grammar with only two tokens!

Gopiandcode · August 30, 2021, 11:24am

Chimrod:

echo "B W W B EOL" | menhir --interpret --interpret-show-cst bw.mly 
Warning: one state has shift/reduce conflicts.
Warning: one shift/reduce conflict was arbitrarily resolved.
Ready!
REJECT

It seems like the grammar has an inherent ambiguity no?

Should B W W W be parsed as

[seq: [seq: [seq: [seq: [seq:] B] W] W] W]
[seq: [seq:] B] [ending: W W W]

Chet_Murthy · August 30, 2021, 7:16pm

OK, now I get your problem. I thought you were asking why the -grammar- should not accept BWWB, not why the menhir automaton did not accept it.

Sure, your grammar is ambiguous (as @Gopiandcode points out). So let’s step away from it: what language are you trying to parse?

In the examples you give, are what are the tokens? Are they “walk”, “ed”, “ing”, “err”, “e” etc? Or are they characters? There’s a well-understood problem that there can be nondeterminism at the level of characters, which can be handled in a lexer, leaving determinism at the level of parsing. So a better explanation of the language you’re trying to parse would be useful.

But backing up, if you’re actually trying to parse actual natural language, I suspect you probably want to back up and look into the literature. using deterministic parsers for this stuff failed decades ago, and an entire field of probabilistic parsing grew up in its stead. [I’m no specialist, so it’s possible that probabilistic parsing got supplanted by something else, too] Also, even if you don’t go look at that stuff, you probably will want to at least look into Earley or CYK parsing: so you can write arbitrary CFGs.

Chimrod · August 31, 2021, 10:47am

Thanks for the clarification. I admit that I embarked on this project without really taking the time to read up on the existing points beforehand, seeing it as both a way to learn OCaml grammars and to explore the field of linguistics.

I’ll take a look at it all with a cool head.

Topic		Replies	Views
Resolving shift/reduce conflict in Menhir grammar Learning menhir	2	4971	January 6, 2020
Menhir conflict for Julia grammar Learning menhir	4	1251	November 5, 2020
Help wanted understanding Menhir parser Learning	8	3385	September 16, 2018
Lookahead in menhir parser Learning	8	2236	February 13, 2019
Parser beginner needs help Learning parser	3	1401	October 5, 2019

Need help with a menhir grammar

Related topics