Recently I learned (from @Stephane_Glondu ) that
pcre (in Debian) is now obsolete, superseded by
pcre2. Also, recently @tobil4sk ported the
pcre-ocaml code to work with
pcre2. The original maintainer of
pcre is busy with other things, so neither he nor @tobil4sk are interested in maintaining
I volunteered to do it, but I figured I should first ask if anybody else wants to do it, just b/c … well, b/c it seems a little presumptuous to just jump in on something like this. I personally want to see
pcre2-ocaml exist and be maintained, b/c I use
pcre often enough when
re isn’t enough (e.g. doesn’t support the regexps I use).
If anybody’s interested in doing it, here’s the repo: GitHub - tobil4sk/pcre2-ocaml: OCaml bindings to PCRE (Perl Compatibility Regular Expressions)
If nobody raises their hand, I’ll get busy preparing a release.
I’ll take a look. I have a strong interest in having a pcre2 library available.
Random question (sorry): would it even make sense to merge the efforts of
re? I get that
re, currently, is a pure-OCaml regex engine, that can used with several concrete syntaxes (including a subset of PCRE, argued to be more efficient than PCRE in its full glory (but perhaps the comparison in
re’s readme uses the original PCRE engine, and PCRE2 has much improved that aspect?)), whereas
pcre2-ocaml is a binding to an existing C library. Still, having both packed together might make sense. Then
re would fully support the PCRE2 syntax, and could switch engines on need. And users would not have to pick a library.
re doesn’t use the
pcre engine: it merely supports the
- And last I looked, that support was not complete. I don’t know whether that remains the case.
re will be sufficiently complete that nobody wants/needs to use pcre. But today, I don’t think that that’s the case.
I generally agree with @glen. Even though
re is not at par with
pcre, I have the feeling that the extra features are not always needed and would prefer a switch to
re if it is not the case.
The latest 1.11.0 release of
re improves compatibility with
pcre. In particular, I’ve added support for named groups and some control characters (but not
\ddd which are subsumed by
\xdd). However, one notable feature is missing in
re: back references in regexps, and it is not trivial to add (I’m less at ease to implement them).
pcre2-ocaml could still be useful if these back references are actually needed… until
re supports them. Maybe @vouillon could tell us more about this?
re support utf8 strings? So that, for instance, a multibyte character can appear in a character class and be treated as one character rather than several?
pcre has a flag to behave like this, but I don’t see anything analogous in
I’ve queued a PR to release
pcre2-ocaml. I didn’t do much review of the code – just released it so I can get going with other packages that depend on it for testing.
If you’re trying to avoid the pcre wrapper by any means necessary (maybe because of the bug you pointed out in another thread), I believe you should be able to use ulex with some massaging. ulex obviously knows about unicode, but (of course) its regexps lack the power of pcre.
Thanks for the suggestion! At the moment I’m successfully working around the bug, and the pcre syntax is more familiar to me (and more powerful), so I’m sticking with pcre.