Hmmm OK. Can I fix it by sending in a second parser to the main parser? Or flatten the list somehow? @Chet_Murthy also recommended have a stateful lexer with two rules, which I’ll try.
Menhir has some built-in grammar rules for lists, would it be better to have the lexer just generate the raw tokens (possibly the list separator tokens too) and let the grammar handle the construction of the list?
See separated_list(sep, element) where you write the grammar for the separator (could be just a single token), and the element (again could be just something like DOCBLOCKELEMENT where %token <mytoken> DOCBLOCK) and menhir will expand that to the proper grammar internally to match and construct a mytoken list.
If you have start/end markers for your list then see the delimited(start, middle, end) rule to help you write it more easily.
And you can combine them, I think something like this should work: delimited(LIST_START, separated_list(SEP, ELEMENT), LIST_END) where the capitals are all tokens produced by the lexer (but they could also be other grammar rules if you need something more complicated).
I’d love to, but I have no idea how to achieve that in the lexer, since it a “sub-language”/DSL. Well, except for splitting the lexer in two states. Will try it later today.
Ah I see, it is like a context dependent grammar, and you have to switch lexers, so essentially the “sub-language” needs to implement both a lexer and what you’d typically implement in the “grammar” in the lexer itself, leaving you just with the outer grammar to implement in Menhir.
In that case can you match something like this in the gramar?
myrule:
| lst = DOCBLOCK { DocBlock lst }
| ... (* more rules here *) ...
And have an OCaml type:
type t =
| DocBlock of innertoken list
given a %token <innertoken list> DOCBLOCK
If needed that innertoken list could be fed to another Menhir parser, e.g instead of DocBlock lst you’d do DocBlock (OtherParser.of_inner_tokens lst)? (where OtherParser would construct a lexing stream out of the list (no need to invoke a lexer) and call the other menhir parser with it?)
In this case I didn’t use menhir at all, I just combined the 2 lexers by hand (it was simpler that way, but should be possible to achieve the same with Menhir)