Musings on extended pattern-matching syntaxes

@gasche My (little) problem with (e' with p2 | _ -> .) is that I understand “refutes p2 or anything”.
May I propose a small cosmetic change with p2 | ..

Ending a match with | . could mean “and nothing else”.

 match x with
  | Some p1
  | None and e' with [ p2 | . ] -> e

I agree – here . plays the role of “unreachable pattern” that I mentioned previously – but I am not sure that the syntax can be made to work without conflicts. If the syntax space is available for it, I think we could essentially get rid of the -> . syntax and replace it with a dot in patterns, without a right-hand-side.

One consequence is that nested match cannot be checked for exhaustiveness

Looking at this statement again, I realise that it’s a little ambiguous, since “nested match” can mean either of two things. For example, in the following expression

match e₁ with 
[ p₁ and e with [ p₂ -> e₂ | p₃ -> e₃ ]]

“nested match” might mean either

  • the phrase and e with [ p₂ -> e₂ | p₃ -> e₃ ] (since it’s nested within the full matcb-expression)
  • the full match expression (since it involves nesting)

I intended the first of these, but I think @gasche picked the second interpretation. In any case, things turned out well: without the ambiguity we’d have missed out on @gasche’s interesting analysis of exhaustivity for matches with nesting.

Here’s a more explicit statement of what I was expressing concern about. Since the earliest days ML-family languages have had the feature that match cases are checked for exhaustiveness. ML programmers rely on this property when reading and writing code; whenever an ML programmer sees a case set of the form [ p₂ -> e₂ | p₃ -> e₃ ], he can be confident that p₂ and p₃ cover the whole scrutinee space. However, the nested match construct does not have this property. Even if the largest enclosing match expression is exhaustive, there is no need for each case set to be exhaustive — and, in fact, most of the usefulness of the nested match construct comes from situations where some of the inner case sets are not exhaustive. I was worried that the loss of this property would violate programmers’ expectations, and make code harder to understand.

However, having thought about it a bit more, I’m no longer so concerned by the problem. In fact, I think the lack of exhaustiveness of inner case sets should be seen as an essential feature, not as a regrettable loss. Since nested match is only useful when the inner case set is inexhaustive, the compiler should check exhaustiveness, and warn if an inner case set is exhaustive. Exhaustive inner case sets should generally be rewritten either to use some form of let binding (if there is only one case) or to be broken into separate match statements (if there are multiple cases).

Here are some examples.

  1. match e with
     | Some x and f x with [ Some y -> x + y ]
     | Some x -> x
     | None -> 0
    

    This is fine; the case set of the nested match is inexhaustive, and so there should be no warning.

  2. match e with
     | Some x and f x with [ Some y -> x + y
                           | None -> x ]
     | None -> 0
    

    This is semantically fine, but a little unclear. The and suggests that the nested match might backtrack, which can never happen, since the inner case set is exhaustive. The compiler could usefully issue a warning and suggest that the code be written to use -> rather than and:

    match e with
     | Some x -> match f x with [ Some y -> x + y
                                | None -> x ]
     | None -> 0
    
  3. match e with
     | Some x and f x with (y, z) -> x + y + z
     | None -> 0
    

    This example is similar to the previous one, but the irrefutable pattern (y, z) need not involve match-binding at all; either the new nested let (with (y, z) = f x), or a standard let binding suffices:

    match e with
     | Some x -> let (y, z) = f x in x + y + z
     | None -> 0
    

These proposed warnings are similar in spirit to warnings 39 (“Unused rec flag”), and to the “Unused” warnings more generally: while code that unnecessarily uses let rec or code that unnecessarily uses match...and is semantically fine, it is written in such a way as to suggest that it uses more of the language than it actually does. In each situation there is a way to rewrite the code to be less misleading to the reader.

One more point. There is already one situation in the language where case sets are not usually exhaustive, namely in try expressions:

try e with
 | E₁ -> e₁
 | E₂ -> e₂

There is, in fact, a close analogy with nested match here. When no case in a try expression matches a propagating exception, control is transferred to the nearest dynamically-enclosing handler. Analogously, when no case in an inner match case set matches the scrutinee, control is transferred to the nearest statically-enclosing case set. For both situations this control transfer is an essential part of the semantics, and it is consequently typically a mistake in either situation for the case set to be exhaustive.

(So, to summarise, my revised view is that (1) it would be useful to have both nested match-binding and nested let-binding (2) I’m not aware of any serious objections to the design (3) the compiler should check for exhaustiveness of inner case sets, and warn if they are found to be exhaustive.)

(This post discusses concrete syntax only.)

Two syntactic constructs have been discussed here, one for let-binding and one for pattern-matching (both nested within a pattern):

p and e with p                (* p:pattern, e:expression *)
p with ip = e                 (* ip: irrefutable pattern *)

(I’m implicitly considering the generalization where any pattern position (p, not ip) may contain a handler, so patterns and handlers are the same syntactic category, with the restriction that there is at most one return clause per match imposed by an additional well-formedness condition.)

I think that both concrete syntaxes are sensible, but I’m not sure how well they work together, because they both use the keyword with for fairly different things. I have been trying to find alternative syntaxes that would allow to mix the two constructs (I’m also now reasonably convinced by Leo and Jeremy’s approach of having an explicit syntactic distinction between intentionally-exhaustive and intentionally-partial constructs) without this unhappy collision. Some proposals below:

p and e with p
p and let ip = e

p and e with p
p and e as ip

p and e as p
p with ip = e

p and match e with p
p and let ip = e

Unfortunately I like none of these proposals more than the current proposal. The use of as is sort of reasonable, but I think as should ultimately be a symmetric pattern-intersection construct (and it’s not symmetric here). The and match and and let are very clear and have few opportunities for syntactic conflict, but it’s also fairly heavy syntactically. If we went that way, we would probably need lighter sugar for common occurrences.

(The two distinct uses of with exist in OCaml today (with in patterns, with in record update), but not so close together.)

Folks, do you think that extended pattern matching clause should be allowed in the same place as when clause (the downest pattern with RHS expression)? Maybe it will be useful to have it near all subpatterns of or- pattern? What was the original motivation about forbidding when nearby every pattern?

Currently when guards must be at the toplevel of a pattern only, and this makes them easier to compile. It’s harder to allow arbitrary interleaving of tests (and term evaluation) and patterns, as proposed here. Extending when to work deep inside patterns is essentially as hard to implement as the proposal in this thread, but less expressive (p when e can be seen as syntactic sugar for p and e with true), so it is subsumed by this discussion.

It seems I didn’t expressed my idea clearly. At the moment near or-patterns we allow when clasue only near RHS:

# function Some x when (x > 0) | None -> "";;
Error: Syntax error
# function Some x when (x > 0) -> "" | None -> "";;
- : int option -> string = <fun>

Is it a theoretical limitation or it wasn’t handy to implement? Should we allow extended patterns everywhere or use the same limitation as for when clauses? While touching relevant code base should we allow (or prepare to allow) when's in matching clauses without RHS expression?

It seems I didn’t express my answer clearly, because I understood your question and tried to answer it.

The limitation of when clauses comes from the fact that it would be harder to reason about and compile the non-limited version.

The “extended patterns” features discussed here cannot be limited in this way (they always imply matching after the evaluation of a term), so the same difficulties (in reasoning/analysis and in compilation strategy) must be lifted here. They are also more general than when which can become simple syntactic sugar for these features (and in particular, could be allowed in depth).

I will start with a different idea that fits the title of this post, and then I will show that it is more on-topic than that by going back to the discussion about concrete syntax.

Rust has “if lets” (reminiscent of C++'s if-declarations), which allow writing something like: (OCaml pseudo-code)

if let Some x = e then f else g

Above, the scope of x is f. In Rust this is useful because the pattern-matching syntax is a bit more verbose so this saves a greater amount of keypresses and eye strain.

I have been wanting to write if-lets for a while in OCaml, for a slightly different reason: match-with does not play nice with semicolons. Compare:

d ;
(match e with
  | Some x -> f
  | _ -> ()
) ;
g

or even:

d ;
let () = match e with
  | Some x -> f
  | _ -> ()
in
g

with:

d ;
if let Some x = e then f ;
g

(This appears to be already properly understood in emacs indentation-wise, so you can try by yourself.)

This would lift one oddity of let-bindings: although it has always been possible to write let A x = ..., unless this is a single-variant type or you are a proof assistant generating OCaml code it is currently not possible to use it because of the exhaustiveness warning and the Match_failure exception.

(I imagine one natural objection, which might explain why if it has been thought about in the past it was dismissed, is that it can lead beginners to write code like:

let x = e in
if let A y = x then ...
else if let B y = x then ...
else if let C y = x then ...

where they lose exhaustiveness checking. So the language is designed to guide them towards match-with. I’ll reuse @yallop’s argument, that this form is for when the working programmer intentionally avoid exhaustiveness checking, and that one can even explore adapting his radical proposition to warn if it is exhaustive by accident. A convincing special case of warning is the accidental if let A x = e then ... where A is the single variant.)

Unfortunately, and in let-bindings does not nest scopes, e.g.:

# let x = 3 and y = x;;
Error: Unbound value x

It would be very confusing to introduce a similar syntax with very different scoping rules. (Other syntaxes using and proposed above have the same issue as noticed by @recoules.)

One can solve this by forcing to repeat let after and. So one would be able to write:

if let A x = e and let B y = f(x) then g(x,y) else h

which would shorten the following:

match e with
| A x -> (match f(x) with B y -> g (x,y) | _ -> h)
| _ -> h

(note the duplicate h).

One benefit of having to write the full and let is that one can finish with a conditional expression:

if let A x = e and let B y = f(x) and cond(x,y) then g(x,y) else h

To sum up, the syntax for ifs becomes:
if’ cond ‘then’ expr ‘else’ expr
where:
cond ::= expr | ‘let’ let-binding [’and’ cond]

This mostly reuses existing syntactic forms, and I believe this does not introduce grammatical ambiguities.

What’s more, the idea can be extended to other places expecting a boolean expression: while and when. In other word, the same idea gives you the let-bindings nested inside match, one of the extended pattern-matching syntax discussed above. (As I promised at the beginning.)

For instance @trefis’s example becomes:

match f () with
| Some y when let A x = g y and let Ok result = h x -> result
| _ -> raise Not_found

In this particular case this can further be simplified (depending on the context) as:

if let Some y = f () and let A x = g y and let Ok result = h x
then result
else raise Not_found
1 Like

Side note: there have been if-let proposals to OCaml:

2 Likes

Excellent, time to revive them. To keep the topic focused on entertaining ideas without planning to implement them, the additional proposal of being able to nest let-bindings is in fact an increase of expressiveness, as explored higher.

Note that @yallop’s suggestion to warn if the matching is exhaustive is in fact a warning that the default case is unused (warning 11), which may be a more natural way to put it.

@gadmm’s post contains a few remarks on concrete syntax (in particular a criticism of and, and then when let suggestion) which suggest using when match as a keyword for non-exhaustive nested matching.

p when match e with p'

That might clash with the existing:

let foo p =
    match p with
    | Some x when match x with
                  | 0 -> true
                  | _ -> false -> "yes"
    | _ -> "no"

although maybe LALR can manage to distinguish them if you are careful in how you set up the grammar.

Indeed – and when let has the same problem. I think the LR parser would probably be okay, but it may be an inconvenience for humans (those that are more LL).

I support the if let syntax proposal. Using them in Rust and it’s very convenient.

To me, it seemed that being able to recognise let x = a -> ... is no different from being able to distinguish let x = a in ... from let x = a ;; ... or from let x = a type ... or from let x = a let .... Can you please explain (to somebody who skipped grammars at school) what is the subtlety with when let?

As for humans, I am not too worried, because these are known to resort to tricks like using unexpected combinations of non-meaningful characters to hint about meaning (“formatting”) and use complex sociological processes to distribute the knowledge of those combinations among groups of humans (“conventions”) :wink:

2 Likes

Hello, Discourse!

We reviewed this thread quite carefully along with features of other languages and summed up the results here.

We also started an active patterns RFC discussion.