A pattern for using GADTs to subset cases

Hello!

The topic of using GADTs comes up every now and then. Personally I find that when I’m using GADTs in an attempt to optimize code I almost always quickly run into some “surprise” that doesn’t pass the type checker as I expected and then I need to spend time trying to figure it out. From what I’ve heard, that is not at all uncommon. OTOH, I’ve found polymorphic variants relatively straightforward to use and they apply to many uses. The problem with polymorphic variants, however, is that thay are compiled inefficiently (they use more memory and matching on them is less efficient).

I just discovered a kind of pattern for working with GADTs that seems to allow one to rather mechanically achieve what one could achieve easily with polymorphic variants, which is to specifically only allow / match a subset of the constructors depending on context (possibly different subsets in different contexts), using GADTs. I’m not claiming in any way that this is the first time the pattern has been discovered or that this technique would be the only or the best way to use GADTs. Far from it. I do, however, very much like the mechanical nature of this pattern. If I had known this pattern earlier, it would have saved me many hours of pulling out my hair!

So, here is a link to the gist: A pattern for using GADTs to subset cases.

I’ve now used this pattern in several cases and all of them have worked out nicely and mechanically.

19 Likes

Catala is using a similar pattern that @AltGr presented at ICFP this year: https://www.youtube.com/watch?v=pO8Z4lcY0ys .

8 Likes

This was cool to read as someone relatively new to OCaml, thanks for sharing!

2 Likes

Thanks for an interesting read.

There’s a point that I don’t get. At some point you write:

Specifying the bounds like this can be a bit cumbersome and it also requires polymorphism.

What is the polymorphism mentioned here? Are you talking about the implicit type variable in types like [< `Foo | `Bar ]? If so, I’m not sure to understand why requiring polymorphism is a bad thing.

The trick itself is neat, albeit rather well-known in programming circles it seems. However, I was genuinely intrigued by your mention of “precision use of GADTs”. What did you have in mind? So far, I have only been aware of the approach based on phantom types in the signature with GADTs you have described in the gist.

This pattern is quite nice because it give a solution to something I was wonering about for some time.
I know that Ocaml as gotten quite complex with Polymorphic/Extensible/regular variants/GADT, but I always asked myself if someone has already tried to add single inheritance, or better, arbitrary extension/restriction to basic sum types, it would be useful and could be implemented quite efficiently.

something like:

type x = A| B 
type y = x +| ( C | D) (* just a bad but obvious syntax. So type y = x.A |  x.B | y.C |  y.D *)
type z = x -| A (* type z = x.B ; restriction like this could be useful for polymorphic variants too *)
type w = x +| (A|C) (* type w= w.A | x.B | w.C *)
                    (* w.A != x.A ; w.C != y.C *)

let foo : y -> x = function  
  | C | D -> A
  | x -> x (* noop *)

This could allow ad-hoc extension/restriction on already defined type without modifying the original definition (from another project for example). Althought the trick presented in this thread looks more flexible if done from the start and with every possible case.

Are you aware of extensible type definitions ? It allows for adding new constructors (but not deleting them though), at the price of having to add a default case to every pattern matching:

type t = ..
type t += A | B
let f1 (x:t) = match x with | A -> 1 | B -> 2 | _ -> 0
type t += C
let f2 (x:t) = match x with | A -> 1 | B -> 2 | C -> 3 | _ -> 0

At the type level, I think object types could be employed to allow for extensibility and even some record masking and private fields setup (cf. this part of the manual for some interesting ideas about private row types). Consider this code in Owi for inspiration.

With object types, I think, the whatever relevant desired notions of inheritance could be simulated.

The more interesting question, though, is whether we may construct the types for tagging data without enumerating all cases in some centralized location. Could we distribute the definition of tagged data throughout the modules, and to actually process this data with functions that were not defined at the same time that the centralized data representation has been defined?

There are some pitfalls with using these open types, since it’s distinctly possible to construct cases that would result in the invocation of the refutation clause even if GADTs and type tags are involved.

Yes I’m aware. What I was talking about is really different.
Extensible types can’t be matched in an exhaustive way. Even if you hide the definition so it can not be extended you still lose the exhaustiveness check.
Extensible types are stateful, they can be extended anytime anywhere if their definition is accessible. That’s good for plugins and dynamic things, a lot less if you want to have a complete view and control.
There is also an efficiency point, extensible types are more complex, always boxed and matching is slower than regular variants.

if I did that (not saying it’s a good idea)

module Ternary_logic = struct
 type ternary_logic = bool +| Unknown
 let not = function Unknown -> Unknown | x -> Bool.not x
 let  (|||) x y = match x,y with
  |true, _ | _,true -> true
  | Unknown,_ | _,Unknown -> Unknown
  | false,false -> false
(* an so on *)
end

Here false and true should be the same as usual, internally represented as 0 and 1, and Unknown is represented as 2.
Obviously you can already do almost the same thing with polymorphic variants, but here typing errors should be better without annotation, same for exhaustiveness check. It should also be more efficient and allow to extend/restrict already defined type and reuse some of their functions without doing conversion (and without their cost or using Obj.magic like trick)

I think an important distinction between using object types and GADT for subsets (or an hypothetical way to restrict/extent variants or records) is that the former are typed structurally and the later nominally. Structural typing is more flexible but implementation is less efficient and generally type errors are worse. I don’t think you can use structural types to encode nominal ones (or maybe in contrived ways)

That’s a good question. Probably it’s possible to do something like extensible types but strictly at compile time and with exhaustiveness check with a preprocessing stage (just convert back to normal sum type). But usability will be poor.
Polymorphic variants seem the better bet, maybe they would be more useful with a little more ‘powers’ (isn’t there some talk about that?), like accessing the hidden row variables.

We could do a lot with modules, apparently, which are more efficient in terms of execution. However, there’s one snag that I have hard time working out. Could we use a first-class module that contains an open polymorphic variant value within it and produce a return that is also an open polymorphic variant value? The challenge lies in handling the type variable associated with the open polymorphic variant…

If we can handle this case, then it should be possible to implement a lot of object functionality without going all the way towards object route.

There’s also another line of thought possible — which I invite you to consider further — which is whether it’s in principle possible to implement efficient objects in an OCaml language (through e.g. special optimization passes, and perhaps under assumption that we have access to the intermediate representation of relevant subset of the entire project)?

That’s a little frustrating when starting playing with polyvars, you can’t really do much magic with them using modules or anything else except extending a statically known closed type. I really would like to at least be able to restrict them.

Maybe if the row variable was accessible some clever things would be possible

type 'a nullable = [ 'a > `Null]

But even like this there would be a need for a kind system to specify that 'a can only be a polyvar else it won’t work either.

What do you have in mind? The picture I have is just adding variants to a variant type, which is highly likely and quite obviously not what you had in mind…

Do you think there may exist a solution through the use of functors?

That’s exactly what I had in mind.

(* that's about as far you can go, nothing too clever*) 
(*you can not reuse further a type using " as " or " [>] " to include in another polyvar *)
module A = struct type t = [`A1 | `A2 of t] end 
module B = struct type 'a t = [`B of 'a] end
module C = struct type 'a t = [A.t | (int * 'a) B.t | `C of 'a] as 'a end

I mean you can not escape those limitations with any trick, functor or anything else I’m aware of with the current implementation.

Maybe there could be some changes in the future, someone more knowledgeable could maybe give some hints.
I think this could be more clearly be stated in the manual, it could prevent people wasting time trying to do impossible things.

Hm… I guess, for API purposes, we could attempt to unify the types to some abstract globally defined types in some fixed module, and then just pass simple module values. But how may we pass multiple functions in a specific module?

It looks like some generic copy-pasting. Basically, declaring an entirely new type. Maybe, somewhat unifiable through the polymorphic variants, but that’s it. It would still be copy-pasting.

I guess you can say it like this. They can be quite handy but you won’t do anything too clever either.

Can you give a simple example of what you want?
Composing in a top module functions and types coming from multiple modules without too much boilerplate?

No. I mean, something like needing a couple of functions to e.g. tag and name. Something that would’ve been passed in a record otherwise.

type 'a tag_ops =
  {
    name : 'a tag -> string
    tag : 'a -> 'a tag
  }
let add_attr ~tag_ops item =
  (* name, log, insert into property list a tagged value, etc. *)

Now, this raises an interesting consideration, whether such extensibility should be eschewed in favor of objects? Is it more plausible that objects may in principle be more optimizeable then the extensible types?

The real issue is how to avoid code duplication. It’s always possible to copy-paste the definitions and to provide global tree that could sub-setted through phantoms. But could we do better? Even with the use of functors, could we do any better?

A thought struck me, maybe we could indeed provide the tree type somewhere centrally, and even relatively trivially pattern-match it. But, the matching of the specific cases should be much more automatable.

I wonder, if we could in fact attain the desired — essentially copy-pasting methodology — with the syntax extension, something from ppx family? Then we could even add cases in new instances of type definitions, and base them on the old ones.