Musings on extended pattern-matching syntaxes

In my opinion, the former looks like a more verbose and convoluted way of expressing the branching logic expressed in the latter quite clearly.

The example of filter_map might not be very compelling to you, but I frequently see / write code that looks like:

match f () with
| Some y ->
  begin match g y with
  | A x ->
    begin match h x with
    | Ok result -> result
    | Error _ -> raise Not_found
    end
  | _ -> raise Not_found
  end
| None -> raise Not_found

which I find frustrating. I guess the proposed syntax would help here (though it’s not clear how readable it would be).

1 Like

I agree that such code is unpleasant :frowning:

I think this is how we’d render your example in the proposed syntax:

let foo () =
  match f () with
  | Some y and g y with (
      | A x and h x with (
          | Ok result -> result
          | Error _ -> raise Not_found
      )
      | _ -> raise Not_found
  )
  | None -> raise Not_found

But if we trade the begin/end brackets with (/), we get code that is virtually identical

let foo () =
  match f () with
  | Some y -> (match g y with
      | A x -> (match h x with
          | Ok result -> result
          | Error _ -> raise Not_found
      )
      | _ -> raise Not_found
  )
  | None -> raise Not_found

Of course, this doesn’t call for the “generalized handlers”, so maybe one could contrive an example including tuples that would be a bit harder to dispatch with vanilla OCaml.

However, neither of this are significantly better than your original example, imo (and I suspect similar scenarios that called for generalized handlers would only be worse).

If I encountered such code in a project I was working on, my first inclination would be to try refactoring/restructuring, in hopes of making it non-exceptional and avoiding the use of three different idioms for partial functions. Failing that, I’d opt for combinators to tidy up the nested branching. Something along the lines of

open Core

let find_a = function
  | A x -> Some x
  | _ -> None

let foo () =
  f ()
  |> Option.map ~f:find_a
  |> Option.value_map ~f:h ~default:(Error Not_found)
  |> Result.map_error ~f:(fun _ -> Not_found)
  |> Result.ok_exn

In general, I think I’d rather see more disciplined delegation of branching logic into combinators and helper functions than a proliferation of keywords in our pattern matching syntax.

3 Likes

Hum, I would have imagined something like:

let foo () =
  match f () with
  | Some y and (g y with A x and (h x with Ok result)) -> result
  | _ -> raise Not_found

But I might be mistaken.

4 Likes

Yes, the compact form of @trefis is correct. (Note: it is also exhaustive on g y and h x thanks to the fallback case that follows.)

I agree that map_filter is not such a good example, because it is exactly a nested match. My previous example was filter that was even worse. The point is that sometimes the structure of the inner matches interact with the structure of the outer matches in ways that you cannot reproduce with a sub-match – expressing this would require a sort of continue with (x = foo, ...) construction to fall back on the rest with a partial environment, and this is off the table.

The example of @trefis illustrates that but only in a very simple way (the interaction is “fall back to the rest if not taken”); the use-case of (Some x | None and 0 with x) is another one that illustrates this (here the idea is to bind some parts of the environments to values that are not extracted from the scrutinee). I don’t know of a very good example, that would be simple and naturally illustrates the full range of interesting behaviors.

3 Likes

Ah ha — that really helps helps clarify the motivation for me. Thanks for the elucidation @trefis and @gasche!

1 Like

I had a discussion with @lpw25 to try to find an actually-working instance of the situation he has in mind where with p = e should enforce the exhaustivity of p locally, without looking at further clauses, to respect the programmer intent. We wrote an example on a napkin and he said he would post a message here. Ping Leo :slight_smile:

I don’t remember the precise example but the general setting was as follows: you have a datatype 'a foo that, for now, has a single constructor Foo of 'a; in particular, the pattern Foo x is exhaustive. In a later change, you add another constructor to this type, and we look at whether existing uses of Foo x will now raise an exhaustivity warning – intuitively, they should all do so.

For instance if you look at

match e with
| (y, Foo x) -> ...
match e with
| y and lookup tbl y with Foo x -> ...

In both cases, the clauses are currently exhaustive, and adding a second constructor beyond Foo would break that property – and you get an exhaustivity warning, which is the desired behavior. On the other hand, if you complete either of these clauses with a catch-all clause | _ -> ..., then you have a fragile pattern-matching, and adding another constructor to Foo will remain exhaustive – a bad thing to happen, which is why we discourage fragile clauses with a warning. Note that in the second example, the"local" and “global” exhaustivity criteria for with p = e or and e with p work in the exact same way in absence of fragile patterns.

Leo had a more sophisticated example where the two criteria (local and global) would not work in the same way, and he claimed the local-exhaustivity semantics was the better choice. It looked somewhat like

  match e1, e2 with
  | None, true -> 1
  | (Some x with Foo y = lookup tbl x), false -> y
  | _, true -> 3
  | _, false -> 4

but note that this pattern is actually fragile in its first argument. No fragility warning is shown, I believe because 'a option is a built-in type, but the following warns:

type t = NoInt | SomeInt of int

let test x y =
  match x, y with
  | NoInt, true -> 1
  | SomeInt x, false -> x
  | _, true -> 3
  | _, false -> 4
[@@warning "+4"]

My current intuition is that occurrences of Foo x as the pattern p in the constructs ... with p = e or .. and e with p has essentially the same exhaustivity behavior as within a pair pattern (..., p), and that the exhaustivity issues that Leo has in mind are orthogonal to whether these new constructions are used. In all cases, we expect a move from a single-constructor type to a several-constructor type to raise an exhaustivity warning, unless the set of clauses is already fragile (in the sense that a pattern that matches additional constructors of the ... part and accepts anything on the p part is subsumed by existing clauses).

To reiterate some of what’s stated above from a different angle, OCaml has two different types of pattern matching: let-binding and match-binding.

With let-binding one can bind the components of a pattern to the value of an expression, writing the pattern before the expression:

let pat = exp

Failure of the pattern to match the value causes an exception (Match_failure).

With match-binding one can also bind the components of a pattern to the value of an expression, writing the expression before the pattern:

match exp with pat

Failure of the pattern to match the value causes matching to continue with the other cases in the same match expression (raising Match_failure only when there are no more cases remaining.)

Both let-binding and match-binding are useful in practice; although match-binding can straightforwardly simulate let-binding, let-binding is probably more common in programs.

It would be useful to extend match with both nested let-binding and nested match-binding. The idea in both cases is to support further computation/binding with the available values within the context of a case. A match expression may reveal information about some of the components of a value (e.g. that the left component of a pair is Some x); we may wish to perform further analysis with that knowledge without abandoning the match, leaving ourselves the option to continue with the other cases. Here is an example in which, when the nested matching fails, matching continues with the outer cases:

match p with
[ (Some x, _) and f x with
              [ Some y -> x + y ]
| (_, Some y) and g y with
              [ Some z -> y - z ]
| _ -> d]

Clearly this is a generalization of when, but more expressive — just as match is more expressive than if.

Nested let-binding within match is useful, too — particularly in or patterns as noted above. Here is an example that shows how nested let binding (introduced with the with keyword) can be used to share right-hand-sides between branches that bind different variables:

match p with
[ Some x, Some y
| Some x, None with y = d -> x + y
| _ -> d ]

(Both these examples demonstrate nested binding at the outermost level of a case, but the design easily extends in both cases to support nested binding deeper within a pattern, and it’s sometimes convenient to write things that way.)

From an ergonomic point of view, some syntax along the lines in these examples — where nested match-binding follows the syntactic order of match (kw expression with pattern) and nested let-binding follows the syntactic order of let (kw pattern = expression) — seems like a wise choice.

To my mind the main (and really the only serious) objection to the whole design is the fact that nested match-binding returns control to the outer match when there are no cases remaining, rather than raising Match_failure, as the current unnested match does. One consequence is that nested match cannot be checked for exhaustiveness; at the same time, inexhaustive matches are the only situation where the nesting is genuinely useful. If it weren’t for this objection, I’d have been pressing to have this in the language a long time ago.

8 Likes

Yet, I hope I will not push open doors.

I like to see construction like the when or the proposed extensions as purely syntactic sugar.
Thus, they are dispensable but valuable for human understanding.

I have the intuition that there is always a way to automatically rewrite them with basic construction, using continuation (some trivial examples below).

Moreover, from stylistic point of view, I don’t really like the with-overuse in

Here, I would prefer a word like where y = d but I don’t want to add a new keyword…
In my opinion, d with y is not that bad but could be abbreviated with the as keyword: d as y (I can no longer find who proposed this)

For functional design, I wonder if the use of and is not misleading as I would expect the scope of the first match to not be available for the next ones (same as let ... and ... in binding if I’m not mistaken).
Here, the keyword then could be used to make the matchs “sequential”:
match ... with Some y then f y with ...
Both and and then could then be used in any combination (with the same associativity and precedence rule than && and ||).

Now, here are some examples of rewriting:

  • when pattern
let foo = function
  | Some x, _ when f x -> x
  | _, Some y when g y -> y
  | Some x, _ when h x -> x * d
  | _ -> d

becomes

let foo =
  let foo_y f = function
    | Some y ->
      if g y then y
      else f ()
    | _ -> f () in
  function
  | Some x, y ->
    if f x then x
    else foo_y (fun _ -> if h x then x * d else d) y
  | _, y -> foo_y (fun _ -> d) y
  • and pattern
type t =
  | Push of int
  | Pop
  | Add
  | Incr

let apply stack = function
  | Push x -> x :: stack
  | Pop and stack with _ :: stack' -> stack'
  | Incr and stack with x :: stack' and 1 as y (* 1 with y *)
  | Add and stack with x :: y :: stack' -> x + y :: stack'
  | Pop | Incr | Add -> raise Not_found

becomes

let apply stack t = match t, stack with
  | Push x, _ -> x :: stack
  | Pop, _ :: stack' -> stack'
  | Incr, x :: stack' -> x + 1 :: stack'
  | Add, x :: y :: stack' -> x + y :: stack'
  | Pop, _ | Incr, _ | Add, _ -> raise Not_found
  • then pattern (generalization of when)
let foo = function
  | Some x, _ then f x with [ Some y -> x + y ]
  | _, Some y then g y with [ Some z -> y - z ]
  | Some x, _ then h x with [ Some y -> x * y ]
  | _ -> d

becomes

let foo =
  let foo_y f = function
    | Some y ->
      begin match g y with
        | Some z -> y - z
        | _ -> f ()
      end
    | _ -> f () in
  function
  | Some x, y ->
    begin match f x with
      | Some y -> x + y
      | _ -> foo_y (fun _ -> match h x with Some y -> x * y | _ -> d) y
    end
  | _, y -> foo_y (fun _ -> d) y

My intuition is that we can check nested bindings for exhaustiveness (in addition to being able to use _ -> . to express the intent of exhaustiveness; we agree one should get useful warnings even when we are not disciplined enough to use this construction), and that it works in the way that one would expect.

In the current setting, exhaustiveness checking is formulated for a $n$-ary match construct (matching on n values at once) with given scrutinee types, and a list of clauses each maching on those $n$ values with patterns. Each clause corresponds to a “matching space” (a set of values that match the clause), and a pattern-matching is “exhaustive” if the union of the spaces of each clause covers the entire space of the scrutinee types.

The “nested let-binding” do not influence exhaustiveness checking, in the sense that they should warn whenever their pattern is refutable (not exhaustive at its scrutinee type), and otherwise they directly preserve exhaustivity. For the “nested match-binding”, the idea is that we are now in a slightly more complex setup where the “matching spaces”, and also the arity of the inputs, depend on the shape of some of the input values (just as matching on a dependent pair/product in a dependently-typed language; it’s not surprise that Agda has a very similar with construct): when we write p and e with p', the matching space of the whole pattenr-matching is extended with an extra scrutinee of the type of e within the sub-space of p, and the matching space of this clause for this new scrutinee is the space of p'.

For example:

match (e1 : int option) with
[ None -> ()
| Some p and (e2 : int option) with None -> () ]

The matching space of the first pattern is None. The matching space of the second pattern is (Some p) * None. The entire matching space for the scrutinees is
(x : int option) * (match x with Some p -> (e : int option) | _ -> ∅),
and in particular it contains the sub-space Some p * Some _ which is not covered by the clauses: this pattern-matching is not exhaustive.

In contrast, consider the following pattern-matchings:

match (e1 : int option) with
[ None -> ()
| Some p and e2 with [ None -> () 
                     | Some _ -> () ]
match (e1 : int option) with
[ None -> ()
| Some p and e2 with None -> () 
| Some p -> ()
match (e1 : int option) with
[ None -> ()
| Some p and e2 with None -> () 
| _ -> ()

The second (multi-)clause of the first example covers the complete subspace (Some p) * (_ : int option): it is exhaustive in the second matched value e2, but not in the first matched value e1, unless p is irrefutable: writing not p for pattern negation, the sub-space Some (not p) is not covered.

The second example has the same (incomplete) matching space: the third clause has matching space Some p * _; it is more general than the space of the second clause. It is also fragile in e2: adding new constructors to the datatype of e2 would preserve exhaustiveness.

The third example has a third clause with matching space _ * _ or, more precisely,
(x : _) * (match x with Some p -> _ | None -> .): it covers the whole scrutinee space, so this pattern-matching is exhaustive and fragile in both e2 and e1.

Remark on sharing vs. non-sharing semantics of identical subexpressions

The second and third examples share the property that the scrutinee space of e2 is implicitly covered by a subsequent clause – the clause does not talk about e2. One could ask whether

match (e1 : int option) with
[ None -> ()
| Some p and e2 with None -> () 
| Some p and e2 with Some _ -> ()

should be considered exhaustive: do the two occurrences of e2 denote the same matching space, or should they be considered two independent matching spaces? The answer to this question, of course, depends on whether we would like to specify a sharing semantics for identical subexpressions under shared bindings (the bound variables coming from p are bound in the same way in both occurrences of e2), with e2 evaluated only once for any scrutinee, or a non-sharing semantics where a subexpression is evaluated no matter what happened in previous clauses.

The sharing semantics for identical subexpressions has the nice property that C[p1 -> e1 | p2 -> e2] is always equivalent to C[p1 -> e1] | C[p2 -> e2]. The non-sharing semantics is easier to specify and implement, and it has the nice property that the pattern-matching behavior can always be made to coincide with the simple one-clause-at-a-time interpretation (turn each clause into a test, and check them from top to bottom).

Personally I prefer the sharing semantics, but I sense that my peers will overall prefer the non-sharing semantics. (One could imagine compromises where we use the sharing semantics when the dependent scrutinee is “obviously pure”, as used in the examples of @recoules above (and stack with ...), but I don’t think they are going to fly either.)

Using the non-sharing semantics makes exhaustivity checking easier and slightly less interesting. A further clause can also match over dependent subspace, but always implicitly and thus exhaustively.

A nice property of the syntax proposal (with clauses deep within patterns) is that we never need the sharing semantics to write any example: any example that would rely on the sharing semantics is of the form C[c1] | C[c2] (write lowercase-c for the syntactic category of clauses p -> e), we can turn it into the more compact C[c1 | c2] where sharing is baked into the syntax. This is not the case of SML pattern guards (see here and then look for “nested matches”), which need to rely on the sharing semantics to express interesting examples – I think this is a deep flaw in their syntax.

2 Likes

To summarize: with the non-sharing semantics, exhaustivity of “nested match-bindings” works as follows:

  1. If a nested-match p and e with p' has a sub-pattern (or sub-clauses) p' that exhaustively covers the scrutinee space of e, then this nested-match is exhaustive. The rest of the pattern can be analyzed for exhaustiveness without considering the nested-match any further.
  2. If the sub-pattern p' is not exhaustive, but a following clause covers the entirety of p, then the nested-match is exhaustive but fragile.
  3. Otherwise: p' is not exhaustive and the rest of p (morally p and e with not p', although this syntax requires the sharing semantics to make full sense) is not covered by a following clause. The nested-match is not exhaustive.

We wrote an example on a napkin and he said he would post a message here. Ping Leo :slight_smile:

Sorry for the very late reply. I think my example was something like:

  match e1, e2 with
  | None, true -> 1
  | (Some x with Foo y = lookup tbl x), false -> y
  | Some _, _ -> 3
  | None, false -> 4

which changes behaviour when a new constructor is added alongside Foo but is exhaustive before and after the change and is not fragile in the option or bool parts of the pattern.

My current intuition is that occurrences of Foo x as the pattern p in the constructs ... with p = e or .. and e with p has essentially the same exhaustivity behavior as within a pair pattern (..., p)

I think this intuition is right and that my point can be reduced to just the observation that in the pair case you can change from:

match x, y with
| p1, p2 -> ...
...

to:

match x with
| p1 -> match y with
            | p2 -> ...
            ...
...

if you want to force exhaustivity on the snd component by itself, but there is no equivalent transformation available in the with case under an or-pattern – so it is useful to provide both semantics directly with different syntax.

Although at this point I thinking I prefer Jeremy’s angle of think about them as just adding both let-binding and match-binding forms.

From the perspective of exhaustivity, we can in fact erase the dependency in the matched spaces. The exhaustivity of the following pattern-matching (right-hand-side erased), matching on one input and then dependently on a second

[ p1
| p2 and e2 with [p21 | p22]
| p3 ]

is equivalent to the exhaustivity of the following non-dependent one, matching on two inputs

[ p1, _
| p2, p21
| p2, p22
| p3, _ ]
1 Like

With the proposed syntax, it is always possible to reformulate
C[p] -> e
into
C[p -> e | _ -> .]
to force exhaustivity of p (with respect to the values that can flow into C[p] in the first place; this is slightly different from your transformation that enforces context-insensitive exhaustivity, which would be expressed with C[x and x with p -> e | _ -> .])

I prefer the functional style of OCaml and too often forget that expressions may have side effects… but, if I like the sharing semantics proposition, I think the non-sharing semantics would be preferable.
My point is – I’m assuming that only a function call can add non obvious side effects, if a developer duplicate the call of a function in a pattern-matching, ether he have good reasons to do so (non-pure) and thus, the pattern-matching is truly non-exhaustive and need warnings, either it is a very poor coding style and he deserves his warnings :wink:

Could we think about an annotation “ocaml.pure” to check if the body of the function is pure and then use it at call site? It would help deciding if the scrutinee is “obviously pure” or not.

it is always possible to reformulate …

That seems like a reasonable way to try and express it, but does it work under or-patterns without duplicating the expression?

Coming back to my original example:

match x with
| Some p1 | None with p2 = 0 -> e

It could be given the right semantics with:

match x with
| Some p1 -> e | None and (0 with p2 -> e | _ -> .)

but I can’t see how to do it without copying e.

It feels like I want to write something like:

match x with
| Some p1 | None and (0 with p2 | _ -> .) -> e

but it is not clear to me that your proposal allows this.

@lpw25 I’m sorry, but I don’t really understand your concern

In your example, for me, p1 is an arbitrary pattern which binds the variable p2 to an int. Then p2 is used in e. The goal being to add a default value to p2 for the case None, it isn’t?

First question, as p2 seems to be irrefutable, why are you adding | _ -> . ?
Second question, is the following snippet look like what you wanted?
Edit: I just realized I wrote exactly the same code…
I was confused by the fact that 0 was used for “arbitrary expression”. Indeed, for int, I think p2 can only be a binding to a variable

p2 is not a variable, it is a pattern (which binds the same variables as p1). The aim is to ensure that p2 is irrefutable, hence the addition of | _ -> ..

You are correct, there is no way to express this without duplicating the expression in this case.

What would solve this is an unreachable pattern, something which expresses the intent of being unmatcheable without being a clause. One could decide to be borderline and consider that _ -> . is both a clause and a pattern (because it never returns, it doesn’t need to be prevented from occurring more than once like other clauses), and in that case the version you want to write is allowed and solves the problem. I don’t know how I feel about it.

In this case, does adding | Some _, false -> . solve the problem of Foo exhaustiveness?

 match e1, e2 with
  | None, true -> 1
  | Some x, false and lookup tbl x with Foo y -> y
  | Some _, false -> .
  | Some _, _ -> 3
  | None, false -> 4

And so:

becomes

 match x with
  | Some p1
  | None and e' with p2 -> e
  | None -> .

@recoules: your approach of None -> . works, but it still results in duplication: now we are duplicating the context around p2, rather than the right-hand-side. Also (and relatedly), it is not a local transformation. So using p2 | _ -> . has better properties.