Type checking pattern matching on polymorphic variants

let x () = `A

let _ =
  match x () with
    | `A -> ()

let _ =
  match x () with
    | `A -> ()
    | `B -> ()

In this program I’m wondering when does the type checker “close” the variant in the return type of x (), to conclude that the first pattern match is exhaustive, while also allowing the second pattern match which has a clause that will never match.

My understanding is OCaml uses row polymorphism under the hood. So the type of x is something like: (not in OCaml syntax)

forall (r : row) . () -> [A, ..r]

(where ..r indicates row extension)

So when we instantiate the type in a use case we get [A, ..r1] (where r1 is a unification variable).

In the first pattern match, r1 needs to be unified with an empty row for the type checker to conclude that the pattern match is exhaustive.

In the second pattern match, it needs to do the same, but after unifying the type of pattern B with the scrutinee type.

When exactly unification of the row variable with empty row is done?

My guess: it type checks this as usual (for each branch, unify LHS with the scrutinee type, unify RHS with the type of the whole expr), then “closes” the variant type to check for exhaustiveness.

Am I correct?

The type of

let match' = function 
| `A -> () 
| `B -> ()

is [< `A | `B ] -> unit on a whole and such pattern-match is exhaustive by definition.
In other words, the match is typed as a whole.

Thus in

let _ = match' (x ())

the type [< `A | `B] is unified with the type [> `A ] yielding the type [< `A `B > `A ]

Yes, the question is when exactly the variant type becomes < from >.

In your example:

let match' x =
    match x
    | `A -> ()
    | `B -> ()`

I start by creating a fresh unification variable for the argument x (to be generalized later).

For each branch of match, I check the pattern type and unify it with the type of x.

So goes like:

  • x : $a ($a is fresh unification variable)
  • unify $a with type of 'A: $a ~ [> A].
  • unify $a with type of 'B: $a ~ [> B].

In the end the gives me x : [> A | B].

When does this type become [< A | B]?

I could also type this function as [> A | B] -> unit and pattern match would become non-exhaustive:

let test (x : [> `A | `B]) : unit =
    match x with
    | `A -> ()
    | `B -> ()

But the inferred type is not this. So the type is being converted from [> ...] to [< ...] at some point. The question is when?

1 Like

The actual typing of the pattern is more involved, but yes the typechecker decides to keep open inferred variant types or not in order to improve the exhaustiveness of pattern matching

More precisely, the type of the single pattern `A might be closer to [?`A | 'r ] where ?`A means that `A can be either present or absent and 'r is the row variable.

In the first step, the unification of the pattern type then goes through each row of the pattern match, giving you a temporary type [ ?`A | ?`B | 'r ].

In the second step, after analysing the pattern matrix, those inferred polymorphic variant types are kept open if this doesn’t make the pattern matching not exhaustive.
For instance, in

let f x y = match x, y with
| `A, `X -> ()
| _, (`X|`Y) -> ()

the pattern is still exhaustive if the type of the first x column is kept open, but the type of the second y column must be closed. Consequently the type of f is inferred as [? `A | 'r1 ] -> [< ? `X | ?`Y | 'r2 ] -> unit. The row variable 'r2 is still here because we could infer later than in fact `X or `Y is present or absent in the type.
At this point, for closed polymorphic variant types, we are back to the surface language for polymorphic variants: [ < ?`X | ?`Y | 'r2 ] can be read as [< `X | `Y ].

Finally as a last step, we go back to the surface presentation for the open variants by marking as present all constructors appearing in them (if there are no conjunctive types). Thus [? `A | 'r1 ] becomes [> `A ].

1 Like