Does using object types in GADTs get you any more power?

I recently learned that you could use object types in GADTs! In this example it’s just used as a “type-level record”, which if I understand correctly means that it isn’t much more powerful that using * or -> types, with the only real difference being that you can name the types.

On first blush this seemed like a very powerful technique to me—I vaguely understand that OCaml’s type system for objects is pretty different from the rest of the language, with objects having row polymorphism / subtyping. But upon experimentation I couldn’t find any way of using this: every weird GADT I could come up with could be replicated only using * types.

Is this technique just not as powerful as I thought, or am I missing something?

1 Like

I found watching three minutes of https://youtu.be/pO8Z4lcY0ys?si=AQ1xqRsqCmJ5iZ4e&t=470 helpful (7:50 to approximately 11:00). That video link came from A pattern for using GADTs to subset cases - #2 by octachron

1 Like

As explained in the talk of @AltGr about Catala, this helps when e.g. you have two or more ASTs that are quite close but you don’t want to:

  • duplicate code on them (e.g. a printer);
  • don’t want to merge them a single ASTs because it implies having to deal with case that should have disappear after some steps.

This is done through sub-typing which you can get with GADTs + either object types or polymorphic variants. It appears that object types are more pleasant to use in this case (for instance, error messages are better).

I recently started to port the code of Owi with this technique and was surprised about how well it went. IIRC, one thing that is not mentioned in the talk but that is quickly needed is to use as 'a.

Louis also gave a talk at OUPS and as an answer, at the next meetup, Boris Yakobowski gave another talk about an alternative they use at Adacore. It is in the case of their compiler where they have 48 micro-steps, each one creating a different type of ASTs. One notable thing is the use of a ppx to help. We are going to put the video online at some point here (but the talk was in french).

4 Likes

GADTs are incompatible with object subtyping: they are always invariant with respect to their type parameters. For instance,

type +'a opt = None | Some: <x:'a> opt

fails with

Error: In this GADT definition, the variance of some parameter
cannot be checked

because GADTs are concerned with equality, and subtyping coercion and equality relationship don’t play nice together.

The use of object types in Catala is purely optional, but using object types at the type-level record provides you:

  • named type parameter <x:int; .. >
  • object type definition inlining:
type t = <x:int; y:int>
type more = <t; z:int >
  • row variable polymorphism for constraints that work if you keep them far away from GADT equations

In other words, object types afford you a better programming experience at the type level but they don’t give you more expressiveness power.

And the brittleness of the last point is the reason why using object type is often better than polymorphic variant for this use case, since object types makes it easier to not introduce long-lived row type variable that will clash later with a GADT equation.

3 Likes

GADTs are incompatible with object subtyping

Sure, I did not say they are. :slight_smile:

I probably used the word sub-typing incorrectly but isn’t row polymorphism a form of sub-typing, as explained in Structural Subtyping as Parametric Polymorphism for instance ?

Row parametric polymorphism can be quite similar to subtyping. However, the presence of GADT magnifies the difference between the two, because GADTs equations will often freeze the row type variable at the most inopportune time, and break the expected subtyping relationships.

Consider for instance, the toy language:


type _ expr =
  | Int: int -> <bool_like:yes; num:yes; value:yes; ty:int > expr
  | Float:
      float ->
      < bool_like:no; num:yes; value:yes; ty:float > expr
  | If:
     <bool_like:yes; num:_; value:_; ty:_ > expr
    * (<bool_like:'b; num:yes; value:_;ty:'ty> as 'e) expr
    * 'e expr
    -> <bool_like:'b; num:yes; value:no; ty:'ty > expr
  | Add:
      (<num:yes; bool_like:'b; value:_; ty:'ty> as 'e) expr
      * 'e expr
      -> <num:yes; bool_like:'b; value:no; ty:'ty> expr

One can write an interpreter to the syntactic values of this language with

let rec value: type b n v ty.
    < bool_like:b; num:n; value:v; ty:ty> expr
    -> <bool_like:b; num:n; value:yes; ty:ty > expr =
  function
  | Int _ as i -> i
  | Float _ as f -> f
  | If (cond,l,r) ->
     let cond = match value cond with
       | Int x -> x <> 0
       | _ -> .
     in
     if cond then value l else value r
  | Add (x,y) ->
      match value x, value y with
     | Int x, Int y -> Int (x+y)
     | Float f, Float g -> Float (f+.g)
     | _ -> .

Looking at the definition of If, one might be tempted to simplify it to:

  | If:
     <bool_like:yes; .. > expr
    * ...

since what matters is that the type is bool_like. However, this is not equivalent simplification, and the previously correct value cannot be typechecked anymore

   |      let cond = match value cond with
                                 ^^^^
Error: This expression has type < bool_like : yes; .. > expr
      but an expression was expected of type
        < bool_like : 'a; num : 'b; ty : 'c; value : 'd > expr
      Type $a is not compatible with type < num : 'b; ty : 'c; value : 'd >
      Hint: $a is an existential type bound by the constructor If.

The mysterious error message is due to the fact that in the new definition of If

  | If:
     <bool_like:yes; .. > expr

the row type variable is now an existentially quantified type variable. And such existentially quantified row type variable cannot be extended with fields anymore. Thus when we extract the cond term in

  | If (cond,l,r) ->
     let cond = match value cond with
       | Int x -> x <> 0
       | _ -> .
     in
     if cond then value l else value r

we are not allowed to inspect any of the other type tag of the condition, which fails because value require all type tags to be available.

In this case, we can recover the simplification by making sure that we don’t have an existentially quantified row variable with:

type 'a tagged = 'a
  constraint 'a = <bool_like:'b; num:'n; value:'v; ty:'ty>
...
  | If:
     <bool_like:yes; .. > tagged expr

But this means that we are keeping away from any subtyping through row type variable.

In other words, subtyping through row type parametric polymorphism doesn’t work reliably in presence of GADTs, and my advice is to avoid it at the start rather than eliminate it later on after many fights against existentially quantified row type variables.

4 Likes

I’m not sure the problem is really due to GADT, but to a bad use of subtyping (so the error message).

Consider these two types:

type 'a t = <bool_like:yes; ..> as 'a
type a = Any : 'a t -> a

The last one is equivalent to the closed object type:

type b = <bool_like:yes>

What we got with row type polymorphism ('a t) is that we know we got a subtype 'a of b. And in the code of value we need a subtype of <bool_like:yes; num:'b; ty:'c; value: 'd>. But, from a subtyping point of view, we only know that some b (or a) are of this type, and not all of them. Hence the type-checker, from a subtyping point of view, can’t prove that 'a is of the right type.

Edit: Row type polymorphism can be use with GADT, under the condition that the row type varibale appears in the type parameters of the GADT. Otherwise, when you hide the row type under an existential, you are upcasting your value in a closed object type (so you don’t really need row type polymorphism) and, therefore, you can’t specialize or downcast it anymore.

There are domains like game development, where the use of classes as an extensible record templates that could actually maintain some invariants upon instantiation is very natural approach. While I could simulate all of these features with some property lists and a myriad of record types and a very clever use of currying, the use of classes feels both natural and also extensible.

Currently, I am looking forward to solve a problem of how a representation of data based on GADT variant types could be extended incrementally in modules. I have a feeling – but not quite a solution – that it may be possible to accomplish this through the use of object system in OCaml. If someone has an insight on this matter, I would appreciate this.

So, what is the ultimate idea of this approach? Is it to use a single type definition that could accommodate both ASTs? And use phantom type subcasing to traverse only portions of the cases of the tree spanned by this single type?

Yes, that’s the idea. It can even be used when you have more than two ASTs (in Owi, we have 5 or 6 IIRC, and they share most of their definition).

I gave a thought about how in principle my case could be approached. It seems that we have two distinct methods, which could be employed simultaneously in a single program, with a third emerging as an extension:

  1. Use basic polymorphic variant tagging with encoders in separate signatures or modules hidden by an opaque type. The processors need only be aware of the context they are working with, and the knowledge of all relevant game types need not be declared at the type level as a closed type (thus, non-incrementally).
  2. Use centralized processing functions. Here, it is distinctly possible to take advantage of non-polymorphic type variants. It is easier to work with such approach when the types used for tagging data are closed — which implies that the knowledge of all relevant game types needs to be declared at the type level as a closed type.
  3. However, it is also distinctly possible to create an encoding where the variant types used for tagging data are actually open types. It’s a slightly more complex approach, since we’d need to construct our pattern-matching functions from implementations in several modules, and also we’ll need some additional type arguments to ensure that only the tags known to the processing function are allowed to be given to it.

Now, the implementation techniques for cases (1) and (2) are relatively obvious. A more interesting question is case (3). Could we hope for any efficiency in case we’re constructing our pattern matching from functions in several modules, a la:

  (* somewhere within the processing function *)
  | Int -> do_something_with_int data
  | Float -> do_something_with_float data
  | _ -> try_next data

  (* where `try_next_data` has the form similar to the one above,
     i.e. it is essentially a large pattern-matching expression with
     a functional application (e.g. `try_next_data`) at the tail *)

Could we hope that the compiler would turn this definition into a single pattern-match in the end?

Note how we have an increase in the number of needed type variables in the data tag specifications when we move from (1) to (2) to (3).

  1. In this case, we need a definition of the type 'a t, since the tags are polymorphic, and could easily be subset.
  2. Here, we need an extra type variable to express subsetting, thus ('a , 'b) t, where 'a is a data type, and 'b is probably some polymorphic variant tag type or such (e.g. [> `Int | `Float]).
  3. Here, we need to introduce an additional parameter to prevent inclusion of tags from unknown modules. So, we have ('a, 'b, 'c) t, where 'c designates associated module or modules used in the definition of the specific instance of processing function that uses these tags.

What do you think about these approaches? Are there other methods that you have in mind in addition to those three?