Anybody have a hankering to maintain `pcre2-ocaml`?

Chet_Murthy · August 18, 2023, 6:42pm

Recently I learned (from @Stephane_Glondu ) that pcre (in Debian) is now obsolete, superseded by pcre2. Also, recently @tobil4sk ported the pcre-ocaml code to work with pcre2. The original maintainer of pcre is busy with other things, so neither he nor @tobil4sk are interested in maintaining pcre2-ocaml.

I volunteered to do it, but I figured I should first ask if anybody else wants to do it, just b/c … well, b/c it seems a little presumptuous to just jump in on something like this. I personally want to see pcre2-ocaml exist and be maintained, b/c I use pcre often enough when re isn’t enough (e.g. doesn’t support the regexps I use).

If anybody’s interested in doing it, here’s the repo: GitHub - tobil4sk/pcre2-ocaml: OCaml bindings to PCRE (Perl Compatibility Regular Expressions)

If nobody raises their hand, I’ll get busy preparing a release.

nobrowser · August 18, 2023, 8:43pm

I’ll take a look. I have a strong interest in having a pcre2 library available.

glen · August 19, 2023, 7:59pm

Random question (sorry): would it even make sense to merge the efforts of pcre2-ocaml into re? I get that re, currently, is a pure-OCaml regex engine, that can used with several concrete syntaxes (including a subset of PCRE, argued to be more efficient than PCRE in its full glory (but perhaps the comparison in re’s readme uses the original PCRE engine, and PCRE2 has much improved that aspect?)), whereas pcre2-ocaml is a binding to an existing C library. Still, having both packed together might make sense. Then re would fully support the PCRE2 syntax, and could switch engines on need. And users would not have to pick a library.

Chet_Murthy · August 19, 2023, 8:30pm

Two thoughts:

re doesn’t use the pcre engine: it merely supports the pcre syntax.
And last I looked, that support was not complete. I don’t know whether that remains the case.

Sure, someday re will be sufficiently complete that nobody wants/needs to use pcre. But today, I don’t think that that’s the case.

Stephane_Glondu · August 20, 2023, 8:27am

I generally agree with @glen. Even though re is not at par with pcre, I have the feeling that the extra features are not always needed and would prefer a switch to re if it is not the case.

The latest 1.11.0 release of re improves compatibility with pcre. In particular, I’ve added support for named groups and some control characters (but not \Cx nor \ddd which are subsumed by \xdd). However, one notable feature is missing in re: back references in regexps, and it is not trivial to add (I’m less at ease to implement them).

So pcre2-ocaml could still be useful if these back references are actually needed… until re supports them. Maybe @vouillon could tell us more about this?

viritrilbia · September 2, 2023, 7:55am

Does re support utf8 strings? So that, for instance, a multibyte character can appear in a character class and be treated as one character rather than several? pcre has a flag to behave like this, but I don’t see anything analogous in re.

Stephane_Glondu · September 2, 2023, 9:48am

Does re support utf8 strings?

I don’t think so.

Chet_Murthy · September 2, 2023, 3:03pm

I’ve queued a PR to release pcre2-ocaml. I didn’t do much review of the code – just released it so I can get going with other packages that depend on it for testing.

nobrowser · September 3, 2023, 2:56pm

If you’re trying to avoid the pcre wrapper by any means necessary (maybe because of the bug you pointed out in another thread), I believe you should be able to use ulex with some massaging. ulex obviously knows about unicode, but (of course) its regexps lack the power of pcre.

viritrilbia · September 3, 2023, 5:00pm

Thanks for the suggestion! At the moment I’m successfully working around the bug, and the pcre syntax is more familiar to me (and more powerful), so I’m sticking with pcre.

Topic		Replies	Views
ANN: `pcre2-ocaml.7.5.1` released Ecosystem opam	5	372	September 6, 2023
Libpcre appears to be disappearing (at least from Debian): should I aggressively stop supporting it? Ecosystem	1	181	December 16, 2024
Archival policy, pcre2 and backlevel ocaml: a conflict Ecosystem	2	200	December 27, 2024
Sedlex moved to ocaml-community Ecosystem	15	2247	September 11, 2018
Volunteers to review the relocatable-OCaml work? Community compiler , reproducible	5	573	May 21, 2025

Anybody have a hankering to maintain `pcre2-ocaml`?

Related topics