"Occurrence exhaustiveness"


I just wrote a family of functions that operate over some type, e.g.

type t = A | B | C

A is a happy path of sorts, and can be handled easily and cheaply; others need much more complicated “clean up” of sorts. So, most of the functions I wrote had a structure like so:

    let foo x y t =
      match t with A -> (x, y) | _ ->
      (* costly computations over x, y *)
      match t with
        | A -> (x, y)
        | B -> (* computation *)
        | C -> (* computation *)

It’s unfortunate that I had to repeat the A case anywhere after that first match to satisfy the exhaustiveness checks.

There’s a notion in Typed Racket called “occurrence typing”, where within the then branch of a conditional (for example), the type of a checked value is narrowed based on the predicate used.

It seems that something similar would be very helpful in cases like this, where in the second arm of the above example’s first match, the effective type of t could be narrowed to type t = B | C. The benefits might be amplified to whatever extent functions like foo call hidden helper functions with values of narrowed types.

Is this something that has been discussed or proposed previously?


I don’t understand why you need to duplicate your match t here. It would make sense if you had a match t followed by a match s, but why ask the same thing twice ? The second time, you already have it.


The first match t is to allow for the early return for the A case, avoiding the subsequent work needed for the other cases.


You can do some of this kind of refinement with polymorphic variants. Using them, foo might look like this:

let foo x y t =
  match t with
  | `A -> (x, y)
  | (`B | `C) as t -> 
    (* 'as' refines t to what is matched by the or-pattern *)
    (* costly computations over x, y *)
    match t with
    | `B -> assert false
    | `C -> assert false

You can improve on this a little by using named sets of polymorphic variants by name with the #name syntax, which gives better error messages and avoids having to type the same or-pattern every time.

Another fairly obvious alternative is to restructure the type: type bad = B | C type t = A | Bad of bad provides a natural split between favourable and unfavourable cases. This does not always work neatly, and you only get one split per data type before the problem crops up again. It can also induce administrative wrapping and unwrapping work for both the programmer and the machine. Still, because it is so straightforward it is always worth considering.

GADTs can be used for some kinds of refinement, but that introduces its own complexities.


I think I would write it as follows:

let foo x y t =
  let costly_computations x y = (* costly computations over x, y *) in
  match t with
  | A -> (x, y)
  | B -> costly_computations x y; ...
  | C -> costly_computations x y; ...


Same for me : writing the same function twice is definitely below my pain threshhold. Having a builtin syntactic sugar to deal with this is nice, but I wouldn’t spend time trying to implement it if it isn’t already there.


Well, yes, if things were actually that simple, then I would do exactly this. The example I provided was an illustration, not actually representative of the complexity.


Thank you, that’s very helpful.