Why there is no easy pattern matching for strings?

Bob_Fang · May 25, 2017, 4:26pm

Hi, I have had this confusion for a while. What I really want is some thing like this:

let rec reverse x= 
  match x with
  |"" -> ""
  |e ^^ s -> (reverse s) ^ e;; 
  (** ^^ is the operator I propose to use for string constructor *)

But it seems this is never implemented. Is there a particular reason? Is this a deliberate design choice or we just don’t think it is not needed?

I am asking because sometimes I found writing a recursive algorithm on string is useful but I have to use an accumulator/index to simulate it which is a bit clumsy.

octachron · May 25, 2017, 6:24pm

In short, defining string as a list of char’s is both very inefficient in term of memory and tends to lead to erroneous API when applied to text.

Indeed a text is not only a sequence of chars: consider for instance à la carte. Neither à nor the fine unbreakable space " " can be easily and uniquely mapped to 8 bits encoded characters. And for instance using an utf-8 encoding for this text and applying your reverse function will not give the expected text.

Since writing a API that maps well to the human interpretation of text is hard, a compromise is to use space efficient memory representation for strings in the standard library and let exterior libraries handle the problem of human interpreted texts.

mjambon · May 25, 2017, 7:23pm

A while ago (2004) I wrote mikmatch, which was a syntax extension that would let you match strings against regexps and capture substrings. It was implemented with camlp4, which turned out to be problematic.

In my experience it’s very useful when routinely dealing with a variety of poorly-designed data formats. However, for software engineering in general, it’s rarely useful because data come in as json and there’s no need for further parsing.

Since moving away from bioinformatics in 2007 I only used it in one case that was really worth it and it was for parsing street addresses.

I know a few people are still using mikmatch happily but it doesn’t seem like a good choice unless you really know what you’re doing.

th3rac25 · May 26, 2017, 8:05am

Maybe check out bitstring https://github.com/xguerin/bitstring

paurkedal · May 27, 2017, 4:03pm

I used mikmatch some years ago for log parsing. In my own judgement, it was just what was needed to place OCaml ahead of Python and almost on par with Perl in quick-and-dirty text processing.

paurkedal · May 27, 2017, 10:08pm

Now, this code is not well tested, but I created https://github.com/paurkedal/ppx_regexp. There is no release yet, so you’d have to pin it (opam pin add ppx_regexp git@github.com:paurkedal/ppx_regexp.git) or build manually.

In the original example, discussions about Unicode aside, there is an essential issue, namely that the cut-point between e and s is undetermined. Regular expressions offers a solution, as they can determine not only whether a pattern matches, but the extents of marked subpatterns. The example can now be written

let rec reverse x =
  match%pcre x with
  | "^$" -> ""
  | "(?<e>.)(?<s>.*)" -> reverse s e

Drup · May 28, 2017, 1:27am

On the subject of mikmatch, tyre can pretty much act as a modern typed version of mikmatch, without using any syntax extensions. In particular, it does efficient routing and has very nice abstraction feature and other goodies:

let re = Tyre.route [
  str"foo" *> int   --> fun i -> print_int i;
  str"bar" *> float --> fun f -> print_float f;
]

let () = Tyre.exec re "foo4"

Since it’s combinator-based, it’s a bit less terse, but you gain in readibility. I never added a ppx extension for it, but I guess it could be useful (using conventions like in format, for example).

(Note that a reverse function should never be written like that anyway. It’s wrong both in term of efficiency and semantics.)

Topic		Replies	Views
Pattern matching on private strings Community type-system	33	4693	February 24, 2018
I want to merge two pattern matching functions Learning	6	3849	August 8, 2018
Regular Expression Learning regexp	19	2977	January 25, 2022
Expression-tree and graph pattern-matching? Ecosystem	3	559	December 5, 2022
Wildcards in pattern-matching / recursive function Learning recursion , pattern	2	649	November 30, 2023

Why there is no easy pattern matching for strings?

Related topics