Musings on extended pattern-matching syntaxes

Hello Discourse,

I thought it might be amusing to share two ideas for extending OCaml’s pattern-matching syntax. (I don’t plan to implement them for now because technically it would be difficult.)

with-patterns using and

Agda has with abstractions, which allow to start, in the middle of a pattern-matching, to further match on the value of an expression:

filter_map : {A B : Set} → (A → Maybe B) → List A → List B
filter_map f [] = []
filter f (x ∷ xs) with f x
filter f (x ∷ xs)    | Some y  = y ∷ filter_map f xs
filter f (x ∷ xs)    | Nothing = filter_map f xs

The experimental with-patterns extensions of the ocaml-patterns project had a restrict proposal of “with patterns”: pattern with pattern = expression

let rec filter_map f li =
  match li with
  | [] -> []
  | x :: xs with Some y = f x -> y :: filter_map f xs
  | x :: xs with None = f x -> filter_map f xs

Personally, I’m not fond of the use of with in this position, because in match .. with the with keyword sits between the expression and the handler. So I would propose to use and <expr> with <pat> instead of with <path> = <expr>:

let rec filter_map f li =
  match li with
  | [] -> []
  | x :: xs and (f x with Some y) -> y :: filter_map f xs
  | x :: xs and (f x with None) -> filter_map f xs

match li with x :: xs and f x with Some y” actually sounds nice.

Note: one could think of and as a natural keyword for “pattern intersction”, a feature that of course should exist in OCaml, but we don’t need to reserve it for this usage as we already have a pattern-intersection keyword, namely as: p as q is the pattern that matches values that match both p and q, and experts in the environment the pattern variable bound in both p and q.

Note: with-patterns generalize Haskell’s view-patterns (<expr> -> <pat> as a pattern matching on a foo when the expression is a conversion function of type foo -> bar and the sub-pattern matches on a bar) by defining f -> p as syntactic sugar for x and (f x with p) for a fresh x.

Note: this does not give you all the power of Agda with-patterns, as you cannot share the matching context (the patterns on the original expressions) and return different right-hand-side based on the new expression only, forcing you to duplicate more code. But this is solved the next proposal!

Generalized handlers

The current OCaml grammar for pattern-matching is of the form match <expr> with <handler>, or function <handler>, where <handler> is a list of clauses

<pattern> [when <expr>] -> (<expr> | .)

The idea of “generalized handlers” is to allow handlers to be nested inside patterns instead of begin only at the toplevel. For example, the following are equivalent:

match e with
| (p, q1) -> e1
| (p, q2) -> e2

match e with
| (p, (q1 -> e1 | q2 -> e2))

In the second example, the handler is inside a (p, _) pattern (which does not itself have a -> ... right-hand-side). The scoping environment of the expressions inside a nested handler contains all the pattern variables that are defined in the pattern context around them – including some that are visually “to the right” of the expression. For example, the following would be (dubious but) legal:

let negate : sign * int -> int = function
| ((Neg -> -x | Pos -> x), x)

Finally, generalized handlers combine with and-clauses (by the simple rule that handlers are allowed whenever a pattern was allowed before) to provide Agda-style with-abstraction:

let rec filter_map f li = match li with
| [] -> []
| x :: xs and f x with (
  | Some y -> y :: filter_map f xs
  | None -> filter_map f xs
)
5 Likes

Jeremy Yallop and I used to spend quite a bit of time thinking about extending pattern matching. We also came to the conclusion that you want to be able to write:

let rec filter p li = match li with
| [] -> []
| x :: xs and f x with (
  | true -> x :: filter p xs
  | false -> filter xs
)

Although we envisioned it as a single construct rather than the combination of a restricted and and “generalized handlers”. I think I would probably still prefer it that way as the generalized handlers idea looks a bit too weird for my taste, but maybe I’m just not used to them.

One point is that you need to have the parentheses to separate the clauses of the and. For this we thought it would be good to allow match cases to be wrapped in [ ... ] and then require that in the and case. I think that was part of the motivation for this PR, which I think would be worth revisiting.

Whist on the topic, another feature that I think is worth having is what I would call with patterns. They are designed to work with or-patterns to let you do:

match x with
| Some y | None with y = 0 -> y

Note that they are fundamentally different from the with patterns that you talk about above. In the ones you mentioned when the with match fails then the case just falls through – and correspondingly there is no warning if the pattern is not exhaustive. With this version it would raise a Match_failure exception if the with pattern failed, even if there was a later case to fall through to – and correspondingly using a non-exhaustive pattern results in a warning.

In the semantics that I have in mind, p and (e with q) only matches values that match p and for which the value of e (which may use variables from e matches q), so your with p = e is entirely subsumed by the and e with p. You can write Some y | None and (0 with y). I agree that it doesn’t read as nicely as with y = 0 for this specific usage, but it extends better to the more complex usages (in particular, for view patterns, it respect the natural ordering of the function-part and the pattern-part, even in absence of the syntactic sugar).

The “Active Patterns” in F# are also related:

Don Syme, Gregory Neverov, and James Margetson. 2007. Extensible pattern matching via a lightweight language extension. In Proceedings of the 12th ACM SIGPLAN international conference on Functional programming (ICFP '07). ACM, New York, NY, USA, 29-40. DOI: https://doi.org/10.1145/1291151.1291159

An interesting, I think, point made in the ICFP07 paper is that such a language construct eases the tension between wanting library clients to have the convenience of pattern matching versus exposing the internal representation. There are other approaches to easing that tension, but an extension of pattern matching like this would make for a very clear idiom.

They are not the same for the reason I gave: they have different behaviour when the pattern does not match. Consider:

match x with
| None with true = false -> 0
| _ -> 1

vs.

match x with
| None and false with [ true -> 0 ]
| _ -> 1

In the first case if x is None then we get a Match_failure exception whilst in the second case we get 1. Similarly, the first will give you an exhaustivity warning whilst the second won’t.

This difference comes from a fundamental difference in what the user intends when using with vs when using and. Using with is a means to bind additional variables, it does not affect which case will be selected, it is so that you can use or patterns in more places – and so you want a warning if your pattern is not exhaustive. Using and is a means to further specify when a case should hold by further matching on additional data – the whole point is to allow the pattern to be non-exhaustive and so you obviously do not want the warning in that case.

1 Like

Ah, I see; apologies for misreading your message above.

I don’t really like this aspect of the semantics you have implemented for with p = e; it gives the impression of being an artifact of the implementation strategy rather than a purposeful design. Note that, in the case of with (Some v) = lookup x, you can write as follows using generalized handlers

| Some x and (lookup x with Some v -> e1 | None -> raise Not_found)
| None -> e2

which I think is better than just with Some v = lookup x -> e1 and letting users reason about match failures.

1 Like

I don’t really like this aspect of the semantics you have implemented for with p = e ; it gives the impression of being an artifact of the implementation strategy

I don’t see why the choice is nothing to do with implementation and everything to do with intended semantics. The key thing here is to make sure that you get an exhaustivity warning when you should get one. Using and to implement something that should really be with is inherently unsafe because a simple change of datatype – i.e. adding a new case – can silently change the meaning of the program.

I think is better than just with Some v = lookup x -> e1 and letting users reason about match failures.

The whole point is to essentially ban with Some v = lookup x -> e1 because it will give you an exhaustivity warning – because for that case you should be using and instead of with.

To make this a little clearer: I am not proposing with as an alternative to and but as an accompaniment to it.

If you only add with then you don’t support the use cases that require and and if you only add and then you encourage people to use the wrong tool for the job and get less safe code as a result.

I think it would be fairly reasonable to expect expressions introduced in and <expr> with .. to be exhaustively covered by patterns, and have a warning otherwise. If we really went for the design with the two features decoupled, I think we would need to reason up to sharing of and on the same sub-scrutinee (for direct syntactic equality), so that all formulations of filter_map in my first message are considered exhaustive on f x. This is a bit more detailed a specification than I’m willing to go today, though.

I also understand your point on complementary, but one aspect of my proposal that I liked is that the two features can be combined to subsume several pattern extensions that we collectively had in mind for a wile (your form of with patterns, view patterns, and the “generalized handlers” that I had on my radar anyway). So, maybe as a silly design exercise, maybe as a more serious temptation to hone in a robust design, I like the idea of trying to do “everything” with only those two features. (Also, the syntactic space is getting crowded and parser conflicts are inevitable, so there are only so much extensions we can fit inside the same dream world.)

I’m not sure that is reasonable. Doesn’t it just reduce the feature to nested match statements?

Since we talk about pattern matching, is it imaginable that when could
be a normal part of the pattern (and thus used in sub-patterns)? It’d be
helpful for some heavy uses of, say, Zarith.

Also, Haskell has with syntax for patterns built-in. Moreover, it has PatternSynonyms a.k.a. Active Patterns from F#.

I propose not inventing a wheel and get inspiration from Haskell, where the code

Prelude> :{
Prelude| f = \x ->
Prelude|   case x of 
Prelude|     Nothing | True <- False -> 1
Prelude|     _ -> 0
Prelude| :}
Prelude> :t f
f :: Num p => Maybe a -> p

evaluates with fallthrough

Prelude> f Nothing
0

Note that Haskell does not have or-patterns meaning they have no use for with patterns, so it is not surprising that they only support the and behaviour.

Sorry for taking away from the main topic, but let me ask you.
Is it possible (and does it make sense in your opinion) to introduce a shortcut for the keyword ‘function’?
For example:

let rec map f
| [] -> []
| x::xs -> f x :: map f xs

instead of

let rec map f = function
| [] -> []
| x::xs -> f x :: map f xs

@c-cube: the current implementation of the pattern-matching compiler is not amenable to nested when – and for the same reason to the and with construct I suggest above which subsumes them. I hope to revisit the implementation in the future (hopefully next academic year), and maybe that could work then.

@lpw25: I don’t think that exhaustive and-patterns reduce (in a conciseness-preserving way) to nested matches, given that they still let you express (Some x | None and 0 with x) and view patterns. Note that a trailing | _ -> e clause makes any and e with ... above it exhaustive, given that it subsumes _ and e with _.

@Kakadu: Pattern Synonyms is also something that would be interesting (their need is somewhat reduced by view patterns, but they are an independently nice and well-defined feature). For previous musings on pattern synonyms, see Gagallium: Pattern synonyms, arbitrary patterns as expressions.

@dm01: I’m more interested in features that let you express new program structures that were not available before – expressing the same thing with existing syntax requires duplicating or at least reordering code. I’m less interested in shorter ways to express the exact same program structure.

1 Like

Ok, so they don’t reduce to nested matches when you allow them under or-patterns, but any use not under an or-pattern will just be a nested match.

In particular, they no longer allow you to express things like:

match .. with
| Some key and Map.find_opt m key with [ Some data -> ... ]
| _ -> ...

which, in my understanding, is a key aim of these kinds of patterns.

Put another way, if you want to support the use case above and also the or-pattern use case then you need to use two different constructs because one requires non-exhaustivity and the other requires exhaustivity. I had thought that the and syntax only made sense for the first use case – but, as you show, it could also work for the second. Either way I think you still can’t get away from needing two constructs to support these two use cases.

1 Like

A small offtop: why we decided not to implement whens near or-patterns? Is it technical difficulty or ideological?

match ... with 
| [x] when foo x
| [x; _] -> ... 
| ....

Whether or not we like the feature (personally I’m not fond of when), it is quite difficult to implement today; this is what @c-cube asked above.

Personally, I’m afraid I can’t see the appeal of these kinds of syntax complications.

I don’t understand why Agda’s

filter f (x ∷ xs) with f x
filter f (x ∷ xs)    | Some y  = y ∷ filter_map f xs
filter f (x ∷ xs)    | Nothing = filter_map f xs

would be preferable to the vanilla OCaml

let filter f (x :: xs) = match f x with
  | Some y  -> y ∷ filter_map f xs
  | Nothing -> filter_map f xs

The latter is less verbose, and just as clear, imo. This is a special case of my preference for case analysis in explicit match expressions over the restatement of the function name and argument in each case. IMO, the latter is strictly worse. It makes sense in the context of Prolog-esque non-deterministic relational proramming, but seems like needless sugar in a functional language. In my experience, it usually results in needless redundancy.

Likewise, I don’t understand why the proposed

let rec filter_map f li =
  match li with
  | [] -> []
  | x :: xs and (f x with Some y) -> y :: filter_map f xs
  | x :: xs and (f x with None) -> filter_map f xs

would be at all preferable to

let rec filter_map f = function
  | [] -> []
  | x :: xs -> match f x with
    | Some y -> y :: filter_map f xs
    | None -> filter_map f xs

In my opinion, the former looks like a more verbose and convoluted way of expressing the branching logic expressed in the latter quite clearly. The proposed syntax seems to give the superficial illusion of reducing nesting, but in fact it’s only rephrasing the nesting in a less direct idiom at the cost of increased redundancy (namely the uninformative repetition of x :: xs and f x).

In the case of

match e with
| (p, q1) -> e1
| (p, q2) -> e2

match e with
| (p, (q1 -> e1 | q2 -> e2))

it seems like all we really want is a way to refine our matching on the second argument of the tuple without redundant binding of the first. Again, it seems we already have very clear syntax for this:

let (p, q) = e in
match q with
| q1 -> e1
| q2 -> w2

this requires a few more characters, but follows the OCaml ethos of explicit and clear communication.

Note that the example code given to combine these two novel syntax proposals

let rec filter_map f li = match li with
| [] -> []
| x :: xs and f x with (
  | Some y -> y :: filter_map f xs
  | None -> filter_map f xs
)

is virtually identical to the standard way we would already deal with this

let rec filter_map f li = match li with
  | [] -> []
  | x :: xs -> match f x with
    | Some y -> y :: filter_map f xs
    | None -> filter_map f xs

This makes it looks like and is just short-hand for -> match. Since this only saves us 5 characters at the expense of even more arcane overriding of existing keywords, I can’t see why it would be preferable!

All this makes me wonder if I am missing some important context…

1 Like