Recursive sedlex.regexp

I’m converting a bnf to sedlex using ppx. Is there any sedlex ppx flag which will enable recursiveness to make the following work?

let tagged_ext_comp = [%sedlex.regexp? astring | tagged_ext_comp, (Star (white_space, tagged_ext_comp)) | lparen, tagged_ext_comp, rparen]

I do not think there is, it would make regexp be able to parse HTML (its a famous meme, but it applies in that case).
To do this kind of stuff you need state to count the depth of the parens, and regexp dont have state.
You can either do that with explicit state with a recursive sedlex rule that takes an integer as parameter, or write a menhir parser that has implicit state (here the state would be a stack).
An exemple of a parameter rule in ocamllex syntax :

rule comment depth = parse
  | "(*" { comment (depth + 1) lexbuf }
  | "*)" { if depth = 0 then 
              token lexbuf
            else 
              comment (depth  - 1) lexbuf }
  | _  { comment depth lexbuf }

This exemple assumes the existence of another rule token, and it switches to that rule when the comment is done, as the comment are discarded by this lexer, but you could also return some stuff you captured.
You can use integers as state if you have only one kind of parens, if you need to be able to have mixed [] and (), but that each ( is matched to a ) and not a ], then you need a stack as state, and you might as well use menhir because its made exactly to do that.
You may also notice that an integer can be viewed as a stack with only one possible value in each cell, were 0 = [] and x + 1 = V :: x, which is neat in my opinion.

Thanks Emile - I’ll use menhir as you suggest - Cheers

1 Like