Different regex rule between Python & OCaml (kinda) confuses me lol

In Python, if you want to match literal parentheses, you need to escape them

Python 3.11.2 (main, Feb 16 2023, 03:20:12) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> TEXT = "Hello World! (Not really cuz I'm in a bad mood)"
>>> regex = re.compile("(\(.*\))")
>>> regex.findall(TEXT)
["(Not really cuz I'm in a bad mood)"]

In OCaml, if you escape parentheses, then you’re writing a group; if you want to match literal parentheses, you do not need to escape them

utop # let text = "Hello World! (Not really)";;
val text : string = "Hello World! (Not really)"

utop # let regex = Str.regexp "\\(.*\\)";;
val regex : Str.regexp = <abstr>

utop # Str.global_replace regex "!!!!!!" text;;
- : string = "!!!!!!!!!!!!"

utop # let regex = Str.regexp "(.*)";;
val regex : Str.regexp = <abstr>

utop # Str.global_replace regex "!!!!!!" text;;
- : string = "Hello World! !!!!!!"

If you write in Python style

utop # let regex = Str.regexp "\(.*\)";;
Line 1, characters 24-26:
Warning 14 [illegal-backslash]: illegal backslash escape in string.
Line 1, characters 28-30:
Warning 14 [illegal-backslash]: illegal backslash escape in string.
val regex : Str.regexp = <abstr>

But it still works?

utop # Str.global_replace regex "!!!!!!" text;;
- : string = "!!!!!!!!!!!!"

I have to be honest, as a literal CS n00b who knows only some Python, it confuses me a lot…

To avoid extra need for escaping, the problem you are facing, you can use quoted strings.

(* the usual *)
let s = "\\(.*\\)"

(* with quoted strings *)
let s = {|\(.*\)|}
2 Likes

I think it isn’t so much a difference between Python and OCaml, as between Python’s regexps, and OCaml str regexps. There are a number of different regexp packages for OCaml: str, pcre, and re (which itself wrappers several regexp packages). They all have slightly different syntaxes. And (heh) this isn’t new news: grep and egrep have different syntaxes, too! I’m not going to go look, but I remember that precisely what you experience is also true there: one requires escaping parens for grouping, and the other forbids it.

Isn’t variety the spice of life? [ok ok, once I bit down on an giant chunk of ginger in a dish, and almost … well, almost had a digestive accident; too much spice can be dangerous]

1 Like

That’s very inspiring; I think it was because I don’t have enough experience in programming / CS and the only language I know is Python (I’m such a noob). Thanks a lot for replying!

1 Like