Mapping GADT with Polymorphic Variant into a more restrictive form

Hello! Suppose I have the following GADT in Ocaml:

type _ t = 
  | A : f t
  | B : f t * f t -> f t
  | C : c t * c t -> f t
  | D : [< f | c] t * c t -> c t
  | E : c t
  | T : string -> [< f | c] t

where f and c are polymorphic variants containing a single constructor:

type f = [ `f ]
type c = [ `c ]

This allows me to combine these constructors in very specific ways. For example, constructor B does not allow any constructor that does not have type f t, while the first parameter of constructor D may accept any constructor of type c t OR f t. Next are a few examples of terms that are achievable using this type

A
B (A, B (A, A))
E
T "test"
D (A, E)
D (T "hello", T "world")

Now, I wish to write a mapping function that essentially “removes” the constructor T out of a term of either type f t or c t. When the constructor T is encountered, it should lookup for a value (which can be of either type f tor type c t) on a mapping environment (which right now we can assume to be a list of associations from string to any constructor of the aforementioned type). Now, as far as I have tested, this seems to be somewhat impossible (or at least very tedious to write) for two main reasons:

  1. inconsistency from constructor D: if we have the term D (T "hello", E), T "hello" could yield a value of either type f t or c t, depending on what the binding for “hello” present in the environment is;
  2. type signatures: even if we eliminate constructor D, a term T "hello" may still yield different results depending on what the environment contains. At the end of the day, constructor T has a type [< f c] t while the result should be either f t or c t

It is utterly possible that I’m overcomplicating this problem, and its solution could as well be trivial whilst being completely blind to it, and its nearly certain I’ve delved so deep into this issue I no longer understand the mechanisms behind the solutions I’ve attempted. However, I found this to be an enticing puzzle. Thus, I have spent a fair amount of time developing some solutions, all of which have failed me in some way or another:

  1. Separating the f t and c t types into their own separate types: However, constructor T should be present in both types, and because they would have identical functionality during the mapping process, I do not wish to add two distinct constructors for this instance. Furthermore, this solution is contingent on having two different mapping functions (one for each type), which is something I’m trying to avoid (this was actually the initial state of this system/mess and the main reason why I tried to work with GADTs and polymorphic variants in the first place);
  2. Replacing the GADT for polymorphic variants: This would allow some sort of subtyping - if we have two polymorphic variant types fpv and cpv for their corresponding “base types”, I could make my mapping function work on these polymorphic variants in addition to the constructor T, while returning only the polymorphic variants (something akin to having the signature f : env -> [< fpv | cpv | `T of string] -> [< fpv | cpv]). However, I have constructors that would depend on these polymorphic variants (i.e, at least one of its arguments will require an argument [< fpv | cpv]), which seems to be impossible with the typechecker in hand;
  3. Keep the GADT, but having separate mapping functions for f t and c t: However, this would not work for constructor D (since we would not know which function to call (and no, I cannot add another parameter to this constructor that would explicitly tell which function we should invoke - i already tested that as well)), nor would it work for a term T "hello" for the same exact reason;
  4. Mapping from this GADT onto a new one without the constructorT: It works for most constructors (assuming we make use of type signatures, which I do not have a problem with), however it does not work for the T constructor, since the resulting GADT does not contain any constructor with type [< f | c] t

I feel like I’m running out of options to work with, and I don’t know what to do next. What else can I try? If this is indeed a trivial problem but I’m simply lacking vision, please do provide me with some resources for me to read and to gain some inspiration. At this point, I’m willing to try (almost) anything.

1 Like

It seems that you have a fundamental issue. The following term is accepted:

B (B (A, T "foo"), C (E, T "foo"))

However you cannot find any term t that could replace both occurrences of T "foo" (unless you use another T constructor).

If you expect that you won’t have to deal with such terms, you can re-type your terms during replacement. Here is something that seems to work for me:

type f = [ `f ]
type c = [ `c ]

type _ t =
  | A : f t
  | B : f t * f t -> f t
  | C : c t * c t -> f t
  | D : [< f | c] t * c t -> c t
  | E : c t
  | T : string -> [< f | c] t

type _ wit =
  | Wc : c wit
  | Wf : f wit

type t_with_wit =
  | Pack : 'a t * 'a wit -> t_with_wit

type env = (string * t_with_wit) list

exception Type_error

let rec replace : type a. env -> a t -> t_with_wit = fun env t ->
  match t with
  | A -> Pack (A, Wf)
  | B (x, y) ->
    let Pack (x, x_wit) = replace env x in
    let Pack (y, y_wit) = replace env y in
    begin match x_wit, y_wit with
      | Wf, Wf -> Pack (B (x, y), Wf)
      | _, _ -> raise Type_error
    end
  | C (x, y) ->
    let Pack (x, x_wit) = replace env x in
    let Pack (y, y_wit) = replace env y in
    begin match x_wit, y_wit with
      | Wc, Wc -> Pack (C (x, y), Wf)
      | _, _ -> raise Type_error
    end
  | D (x, y) ->
    let Pack (x, x_wit) = replace env x in
    let Pack (y, y_wit) = replace env y in
    begin match x_wit, y_wit with
      | Wc, Wc -> Pack (D (x, y), Wc)
      | Wf, Wc -> Pack (D (x, y), Wc)
      | _, Wf -> raise Type_error
    end
  | E -> Pack (E, Wc)
  | T s -> List.assoc s env

let expect_type : type a. a wit -> t_with_wit -> a t =
  fun wit t_wit ->
  match wit, t_wit with
  | Wc, Pack (t, Wc) -> t
  | Wf, Pack (t, Wf) -> t
  | _, _ -> raise Type_error

let replace_same_type : type a. env -> a t -> a t =
  fun env t ->
  match t with
  | A -> expect_type Wf (replace env t)
  | B (_, _) -> expect_type Wf (replace env t)
  | C (_, _) -> expect_type Wf (replace env t)
  | D (_, _) -> expect_type Wc (replace env t)
  | E -> expect_type Wc (replace env t)
  | T _ -> (* Cannot infer expected type *) raise Type_error

You will note that it’s not possible with my solution to use replace_same_type on terms of the form T _. This is because with such a term, you cannot know what the expected type was, so whatever term you replace it with, you cannot guarantee that it will have the same type. However, as long as the top constructor is not T, everything works.

I’ve tried a version where the constructor T carries a type witness, and it seems like it should work but fails because of what I see as limitations in the type inference of GADTs with polymorphic variants (it works if f and c are not polymorphic variants).
Here is the version that works, without polymorphic variants:

type f = private F
type c = private C

type _ wit =
  | Wc : c wit
  | Wf : f wit

type _ t =
  | A : f t
  | B : f t * f t -> f t
  | C : c t * c t -> f t
  | D : _ t * c t -> c t
  | E : c t
  | T : string * 'a wit -> 'a t

type t_with_wit =
  | Pack : 'a t * 'a wit -> t_with_wit

type env = (string * t_with_wit) list

exception Type_error

let rec replace : type a. env -> a t -> a t = fun env t ->
  match t with
  | A -> A
  | B (x, y) -> B (replace env x, replace env y)
  | C (x, y) -> C (replace env x, replace env y)
  | D (x, y) -> D (replace env x, replace env y)
  | E -> E
  | T (s, wit) ->
    let Pack (t, t_wit) = List.assoc s env in
    begin match wit, t_wit with
      | Wc, Wc -> t
      | Wf, Wf -> t
      | _, _ -> raise Type_error
    end
2 Likes

Ah, I actually got it to work with the polymorphic variants:

let rec replace : type a. env -> a t -> a t = fun env t ->
  let[@local] cast : a wit -> t_with_wit -> a t = fun w (Pack (t, t_wit)) ->
    match w, t_wit with
    | Wc, Wc -> t
    | Wf, Wf -> t
    | _, _ -> raise Type_error
  in
  match t with
  | A -> A
  | B (x, y) -> B (replace env x, replace env y)
  | C (x, y) -> C (replace env x, replace env y)
  | D (x, y) -> D (replace env x, replace env y)
  | E -> E
  | T (s, wit) ->
    let v_s = List.assoc s env in
    cast wit v_s

Although note that with the extra type witnesses, you can’t have a term D (T "foo", E) and expect it to work if you replace T "foo" by either A or E. You have to choose in advance which type the T terms will actually represent.
It’s not that different from having two constructors T_c : string -> c t and T_f : string -> f t, except that you can factorise a bit some of the code.

2 Likes

Thank you for your help! I did come up with the solutions you proposed a few days ago (I simply forgot to talk about them more in depth due to time constraints, so I apologize for that), especially the one with polymorphic variants, but I deemed them not satisfactory enough for two main reasons:

  1. I really need to use the replace function with terms of type T _. This comes from a previous part of the system which can produce terms of type T _. In this scenario, the mapping function should simply retrieve whatever is in the environment. So, for example, T "hello" could either return an f t or a c t term, depending on what is present in the environment (i.e, the mapping function is contingent on runtime information);
  2. Indeed the term B (B (A, T "foo"), C (E, T "foo)) is to be accepted in the sense that it is a well formed term. However, during the mapping operation, this term should be rejected by an exception or some other error handling method, since “foo” cannot map to both an f t AND a c t term. What I had imagined was for the function to recursively call itself onto the constructors arguments. When it reached the first T "foo" constructor, get what is in the map and check to see if it is an f t term. When encountering the second T "foo" term, it would do the same exact thing, with the exception of testing whether its mapping is a c t term. One of these would fail, which would then throw and exception and stop the execution of this function.

One idea I also had but didn’t manage to get working was to provide the function with an optional “scope” argument, which would tell the function whether we were looking for an f t, a c t or really any _ t term. This seemed clunky and I couldn’t manage to get it working with the function’s type signature. But again, if anyone has any other possible solutions, I’m all ears.

Sorry, I don’t understand exactly what the problem is with the replace function from my first code block. It seems to me that it matches your requirements, can you explain what is missing ?
The only issue I could find is that it doesn’t have the type env -> 'a t -> 'a t, but you haven’t mentioned that as a requirement (and I don’t think it is possible to write a function with such a type that handles raw T _ constructors correctly).

After finally resting and clearing up my mind about this, I actually realized there was absolutely nothing wrong with this implementation. Not only that, but it was actually a solution I was looking for (specifically the one without type witnesses, which could pose issues with raw T _ constructors of unknown underlying type)! One cool thing about this solution is that it actually handle raw T _ constructors correctly (well, at least according to my definition of “correctly” for this particular problem): if I call replace [("foo", Pack (A, Wf)] (T "foo") the result will be Pack (A, Wf), and if I call replace [("foo", Pack (D (A, E))] (T "foo"), the result will be D (A, E). I had thought of this solution for a prior problem of mine (with this exact program actually), but I discarded it because it seemed clunky. However, you’ve shown me it doesn’t look all that bad actually. It also helped that you made the separation between mapping and expecting types of a specific witness type, which actually could further simplify the remaining code a lot.

Just to clear some things up, there was indeed no issue with the function not having the type env -> 'a t -> 'a t, and indeed the type B (B (A, T "foo"), C (E, T "foo")) is to be accepted up until the point where we have enough information to make an assertive decision on what “foo” maps to.

As it turns out, having one extra layer of indirection was the solution for this problem. Thank you so much for your help!