Recursion in Menhir lexer for small DSL

Um, that regexp will match

/** foo bar */
....
...
/** goo boo */
...

by going from “foo” all the way thru to “boo”, no? You want a slightly more-complex regexp in the middle (instead of _*)

Hey Chet, yeah, those are all valid docblocks too, but should return empty list. Other valid examples:

/**
 * Mo mo mo, some info
 * @return void
 */

or

/** bla
 * @param array<string, int> $ar
more bla bla */

No, I mean that you’ll end up grabbing both docblocks, and all the code in-between, won’t you ? Lex will look for the longest-match, right?

Ah crap, you’re right. Thanks, will fix. :pray:

Instead of _*, maybe you want something like:

( [^ '*'] | '*'+ [^ '/' '*'] )*

[I’m doing this on-the-fly, so I could be making a mistake here]
The idea is, you want the complement of the language of “*/”.

At least, I think that’s how it works – been so long I don’t quite remember anymore.

1 Like

I’ll check some docs if I can match the shortest possible string instead. :slight_smile:

Oh wait, and then at the end, you have '*'* – right before “*/” .

It’s only the character combination */ that stops the comment. I made a buffer now instead, with a separate rule. Didn’t find anything about non-greedy matching in ocamllex.

Yes, but the string “/" also matches the regexp "_”. So the input

/** abc */ x y z /** def */

will yield a single docblock, containing the entire line. Or at least, IIRC, that’s how lex will work.

For closure, this is what I ended up with:

and docblock_comment buffer = parse
  | "*/"                          { DOCBLOCK_AS_STR (Buffer.contents buffer) }
  | '\n'                          { new_line lexbuf; docblock_comment buffer lexbuf }
  | whitespace_char_no_newline+   { docblock_comment buffer lexbuf }
  | _? as s                       { Buffer.add_string buffer s; docblock_comment buffer lexbuf }
  | eof                           { failwith "unterminated docblock" }

Ah, that should work (IIRC b/c) longest-match wins, and that’s “*/” .

It is simpler to write what you wrote, than to calculate out the regexp, even if (to me) the regexp is … more satisfying grin.

I didn’t find any way to make a non-greedy regexp, so. No choice. :no_mouth:

1 Like