Call Menhir-generated parser without a lexbuf

OK, so I’ve setup two parsers, and one lexer. I’m getting a token list from the lexer, then call the second parser from the first parser. BUT, now the second parser expects a lexbuf, when I already have a token list (and in fact no lexbuf). When I look at the generated “” file, I only find one function that’s supposed to be used from the outside, which is the same name as the start rule in the grammar. So I’d have to hack around it? Or convert the token list to a string or something? :thinking:

Generated function:

let docblock =
  fun _menhir_lexer _menhir_lexbuf ->
    let _menhir_stack = () in
    let MenhirBox_docblock v = _menhir_run_0 _menhir_stack _menhir_lexbuf _menhir_lexer in

I want to be able to do this in the main parser (unrelated code removed):

    | d=DOCBLOCK "function" {
        Function {
            docblock = Docblockparser.docblock d;

where DOCBLOCK is defined as

%token <Docblockparser.token list> DOCBLOCK

Or maybe I should just hard-code the grammar of docblock in the lexer instead… It follows a pretty rigid pattern. :thinking:

There are as many functions in the generated interface of the parser than there are start symbols in the grammar: each of these functions has type (Lexing.lexbuf -> token) -> Lexing.lexbuf -> t, where t is the type associated to the corresponding start symbol. To call such a function, you therefore have to provide a function Lexing.lexbuf -> token and a lexbuf: as far as I know, the lexbuf is just used by the parser to extract position informations, and otherwise is just passed as is as argument to the provided function to get tokens. Therefore, if you have a token list, you may convert it to a function that dispenses tokens, ignoring the lexbuf. For instance:

let dispenser_of_token_list (l : token list) : Lexing.lexbuf -> token =
  let d = Seq.to_dispenser (List.to_seq l) in
  fun _lexbuf -> Option.get (d ())

and then you just have to give a dummy lexbuf to the parser function (for instance, Lexing.of_string "").

However, positions are lost with this approach, and that prevents your code from reporting the errors correctly. As far as I know, the lexbuf fields used for error reporting are lex_start_p and lex_end_p: therefore, you can enrich your token list as a (token * Lexing.position * Lexing.position) list, saving for each token the values of lexbuf.lex_start_p and lexbuf.lex_end_p in the lexer, and restoring these values (the fields are mutable) in the dispenser.

1 Like

The MenhirLib library has few options to convert a lexbuf parser generated by Menhir into a dispenser-based parser, the simplest one is MenhirLib.Convert.Simplified.traditional2revised:

let revised_parser dispenser =
1 Like

Hm, what does this mean?

I just keep getting weird errors…

Error: docblock generates the language {epsilon}.


%start<Ast.docblock_comment list> docblock

  | {[]}


This is really cool, thanks. :slight_smile: Just gotta get the second parser logic to work now.

“A” for effort, but I this is not available on my 4.08 installation, and when switching to 4.14, expect_test_helpers_kernel becomes uninstallable for some reason, with debug output just claiming “conflict!”, hehe. Reported it to their repository.

Testing with OSeq instead…

Hm, getting

Invalid_argument “option is None”

inside the generated parser code. The failing line looks like this:

let _tok = _menhir_lexer _menhir_lexbuf in

so probably related to the lexbuf somehow?

Token provider code:

 let linebuf = Lexing.from_string "" in
 let dispenser_of_token_list (l : Docblockparser.token list) : Lexing.lexbuf -> Docblockparser.token =
     let d = OSeq.to_gen (OSeq.of_list l) in
     fun _lexbuf -> Option.get (d ())
 let disp = dispenser_of_token_list cb in
 let cb = if List.length cb > 0 then Docblockparser.docblock disp linebuf else [] in
 Function { ... }

OK, maybe I’d better separate the docblock parser into a completely separate lexer+parser step instead of trying to merge them together… Unless parameterized rules in Menhir would work. Dunno, never tried.

This error means that the parser tries to get tokens whereas the end of the list has been reached (since, in this case, d () returns None, and Option.get fails). Whether it is expected or not for the parser to reach the end of the list depends of how you have designed your grammar (is there an explicit end for the phrase in token list or not), but if the end of the phrase is implicit in the token list, therefore you probably want to introduce an End_of_stream token, and either add End_of_stream explicity at the end of the token list, or use something like Option.value ~default:End_of_stream (d ()).

1 Like

@octachron, I didn’t know about these conversion functions, thanks!

A parser that takes as input a “dispenser”, that is to say a function unit -> t for some t, instead of a lexbuf (in the case of what MenhirLib expects, t is token * Lexing.position * Lexing.position, that is to say tokens enriched with position informations).

1 Like