Locally Abstract Types in an Or-pattern with a GADT Scrutinee: Clash during Refinement?

Patterning matching with or-pattern, locally abstract types and GADT has some curious semantics.

Consider the code below , which is a standard implementation of type witness and equality witness, with an additional problematic implementation that simply tries to merge two similar branches of pattern matching into a single branch using an or-pattern. The two parts of the or-pattern: (Int, Int) and (Char, Char) make incompatible refinements for the locally abstract types a and b. The former refines them both to int, whilst the latter — char. The machine inferred type, as shown by the error message, for Refl is (a, a) equal ,which is not compatible with the annotated return type (a, b) equal, therefore the definition is rejected.

The question:

How exactly the type (a, a) equal is inferred for Refl?

My own answer to question 1: Start with the result ('c, 'c) of anti-unification of the tuples (int, int) and (char, char), followed by a conversion from free type variable to abstract type: ('c, 'c) -->> (a, a). But why a? Why not b nor a fresh locally abstract type? Why the unnecessary conversion ('c, 'c) -->> (a, a)?

I am also wondering where can I find from the ocaml source, about code that implements pattern matching and locally abstract types?

module Ty  : sig
  type _ t = Int : int t | Char : char t
  type (_, _) equal = Refl : ('a, 'a) equal
  val ty_equal : 'a t -> 'b t -> ('a, 'b) equal option
end = struct
  type _ t = | Int  : int t
             | Char : char t

  type (_, _) equal = Refl : ('a, 'a) equal

  let ty_equal (type a b) (a : a t) (b : b t) : (a, b) equal option
    = match a , b with
    | Int , Int -> Some Refl
    | Char, Char -> Some Refl
    | _, _ -> None

  module Problem = struct
    let ty_equal (type a b) (a : a t) (b : b t) : (a, b) equal option
      = match a , b with
      | (Int , Int) | (Char, Char) -> Some Refl
      | _, _ -> None

This is a known limitation of or-patterns with GADTs inside: type equations (learned by matching on a GADT constructor) from within the or-patterns are not propagated outside.

My guess about what happens in your example:

  • In the (Int, Int) case, we learn two equalities a = int and b = int, so we have the equalities a = b = int in scope in the right-hand-side. This occurrence of the constructor Refl has type ('a, 'a) eq for some inference variable 'a, in particular checking Refl : (a, b) eq is valid. (In this scope of equalities, a and b are indeed the same type.)
  • In the (Int, Int) | (Char, Char) case, we learn no equalities, so we have the distinct types a and b in the right-hand-side. This occurrence of the constructor Refl has type ('a, 'a) eq for some inference variable 'a, checking Refl : (a, b) eq starts by unifying 'a = a, which succeeds (at this point the constructor has type (a, a) eq), and then 'a = b, which fails as a and b are assumed to be distinct.

Thanks, @gasche , it explains !

Here is an idea of a fix, which is also an or-pattern semantics for GADT.

Instead of learning no type equation from the or-pattern (Int, Int) | (Char, Char) can we learn the type equation a = b = int (call it eq1) from the first disjunct (int, int), and learn a second type equation a = b = char (eq2) from the second disjunct (char, char), and maintain a list of type-equation/state pairs

[ (eq1, state1) ; (eq2, state2) ]

where only one of the eq1 and eq2 is active at a time, indicated by the boolean value of state1/2. Only the active equation is in the scope in the right hand side. Then type checking of Refl proceeds as follows: Initially, state1=true, state2=false, and Refl has type ('a, 'a) eq for some inference variable 'a, and checking Refl : (a, b) eq is valid because in the scope of eq1 which is active now, a and b are indeed the same type int. Next, set state1=false, state2=true to activate eq2 (and deactivate eq1), and undo the binding of the inference type variable 'a, and redo the type checking, and this time ('a, 'a) eq the inference type of Refl again unifies (a, b) eq because of eq2. This agrees with the type checking logic where the or-pattern are breaked into separate branches as in the non-problematic code.

I don’t understand the full details of your proposal; it sounds like you are proposing to type-check the right-hand-side twice, but then I don’t know why one should propagate a “state” when checking the right-hand-side, using the normal type-checking process should suffice.

We don’t want to duplicate the checking of the right-hand-side, because:

  1. It corresponds to macro-expanding or-patterns instead of understanding them as a proper language feature.
  2. Such backtracking may create performance issues in some cases where several or-patterns are used, possibly nested, inside the same pattern. (Say (A|B|C),(A|B|C),(A|B) -> ...).
  3. Type-checking the same expressions several times interacts very poorly with other aspects of the type-checking machinery, for example the ones that raise warnings during type-checking, now warnings typically get duplicated.

Perhaps the rule-of-thumb for a casual GADT user is just to avoid using or-patterns.

I had a look at the discussions on GitHub (link below) regarding this issue, and could see that there are indeed implementation-related intricacies yet to be resolved.

It is surprising that such a seemingly / intuitively simple problem could have such challenging technical implications.

Personally, the typing rule that I would ideally expect is that the intersection of equations on both sides is taken. This is less expressive than duplicating type-checking, but it is enough for many examples including this one, and I think the behavior is reasonably intuitive. (This is also how it could work for term variables.) However, as you point out, this is not easy to implement, so currently we don’t keep any equation at all.