How to convert string to type "regexp"?

HoraDeComer · June 12, 2020, 8:47pm

Chet_Murthy · June 12, 2020, 9:58pm

You’ll need a parser (and perhaps a lexer). There are a bunch of ways to do that: you can use ocamlyacc/menhir (and ocamllex/sedlex), and then there are a bunch of parsing combinator libraries (IIRC, Angstrom is one). You could also write a recursive-descent parser by-hand (nobody does that anymore except in first-year programming classes grin).

If you’ve never written a parser before, I’d strongly suggest you back up and learn how to write one for something like a simple arithmetic-expression language (+, -, /, *, unary-minus) before approaching your problem, simply because for the arithmetic-expression language, it’ll be -easy- to discern when you’re making mistakes.

Also, that is typically the sort of toy problem that is presented in most tutorials for parser-generation tools (like yacc/menhir, angstrom, etc).

If you’ve never done this before, don’t be afraid: we all had to do it once, and it’s completely surmountable. There are many books on the subject: any decent compiler-construction book will have a chapter or two on lexing and parsing at the beginning, and it’s definitely worth finding one and reading/following that, also.

HoraDeComer · June 12, 2020, 10:08pm

i think it is my case

Chet_Murthy · June 12, 2020, 10:11pm

Take heart. I have a friend who teaches an intro compiler course, and he told me once that he spends most of the time on parsing and lexing … and not on semantic analysis (e.g. typechecking) and code generation. Why? Because in their careers, every one of his students will end up designing and implementing “little languages” over-and-over-and-over. They will need to be very comfortable with parsing/lexing. But very few will have the privilege of writing a real compiler with type-checking and code-generation, or even modifying such a compiler. It’s just not that common a task for programmers, even though it’s an important one.

So he spends most of his energy making sure his students know how to do the tasks that they’re going to see in the real world.

ETA: concretely, I’m sure many people will have good suggestions for compiler books. I’m old enough that it was the “Dragon Book” (_Compilers: Principles, Techniques, and Tools" by Aho, and eventually a cast of other well-regarded CS profs). Really, any compiler book will do, b/c you’re wanting to learn the -beginning- of the book. I searched for one that presented the material in Ocaml, to no avail. But also, the part that you will need to learn, is completely independent of the programming language you choose to implement in: the technology of lexing and parsing is all implementation-language-agnostic. Hence, “ocamlyacc” takes its name from “yacc” which generated code for C. IIRC the original caml-light “camlyacc” actually invoked yacc under the covers to generate the parsing tables, but I could be mis-remembering (gettin’ old).

Take heart! You can learn this! And you’ll use it over and over throughout your career!

kandu · June 22, 2020, 9:06am

I once needed a concurrent parser combinator library that accepts network channels directly, but neither go nor erlang community seemed to have such one.

and by the way, a regexp engine with such capabilities was also created.

The 44 LOC function, str2reg, explains how:

github.com

kandu/ok_parsec/blob/master/src/re.ml#L54


  | '\\' ->
    (match buf.str.[buf.pos+1] with
    | 'x'->
      let c= "0x" ^ (String.sub buf.str (buf.pos+2) 2)
        |> int_of_string
        |> char_of_int
      in (TkC c, {buf with pos= buf.pos+4})
    | c -> (TkC c, {buf with pos= buf.pos+2}))
  | c -> (TkC c, {buf with pos= buf.pos+1})

let str2reg str=
  let rec buf2reg buf=
    let getItem= function
      | TkE-> Ce
      | TkC c-> C c
      | TkS s -> buf2reg @@ initBuffer s
      | other -> raise @@ Failure
          (Printf.sprintf "match %s"
            (match other with OpC -> "OpC" | OpO -> "OpO" | _ -> "Tk"))
    in
    let rec parse curr nextbuf=

and https://github.com/kandu/ok_parsec/blob/master/src/re.ml#L221 is

let make str= (dfa2sm (nfa2dfa (reg2nfa (str2reg str))))

illustrates the entire process on how a string is transformed, string -> internal regex representation -> nfa -> dfa-> state machine

And there is a test executable in the test directory. Invoke it, you will get outputted graphviz files of the intermediate nfa & dfa & state machine , from which svg images are also generated, and finally the test program will open the image for you.

Topic		Replies	Views
Regular Expression Learning regexp	19	2974	January 25, 2022
How to convert Regexp to NFA in OCaml? Learning	18	4610	November 9, 2018
Regexp solution for Advent for Code 2024 Day 3 Learning regexp	3	159	December 8, 2024
Recursive sedlex.regexp Learning	2	255	March 13, 2023
Inductive Datatype: Convert an expression into a string Learning	6	5878	March 21, 2018

How to convert string to type "regexp"?

Related topics