Narrowing variant types alternatives

gadt
adt
#1

Consider the following type representing types in a programming language:

type t =
  | Name of string
  | Record of (string * t) list
  | Array of (t * int)

Is there a way to limit the type t in Array of (t * int) to be only a Name or Array? i.e. I would like the following to cause a compile error:

let _ = Array (Record []) in
(* ... *)

so that Array types can only be built from Name or Array constructors. Currently I find myself with a few special conditions during runtime that cannot possibly happen but that I need to check for exhaustiveness, because I don’t know how to encode a more strict representation.

I hope I explained myself, I’m a beginner to the language. Thanks.

1 Like
#2

Yes, you can use GADT exactly for this purpose

type scalar = Scalar
type vector = Vector
type _ t = 
  | Name : string -> scalar t
  | Record : string * _ t -> vector t
  | Array : scalar t * int -> scalar t

Probably, I chose the wrong names for type constraints :slight_smile:

Another option would to use polymorphic variants, but this will require some restructuring of your original design. For an example, you can look into the C type system representation in BAP.

4 Likes
#3

]I’ve never used 'em, but my understand was,] This is what private constructors were supposed to be for, right?

#4

@ivg’s GADT suggestion is probably the most ergonomic one, but for completeness, here are two others, an ADT one, and one using polymorphic variants (as also suggested by @ivg).

module ADT = struct
  type not_record =
    | Name of string
    | List of t
    | Array of (not_record * int)
  and t =
    | Record of (string * t) list
    | Not_record of not_record
end

module Polymorphic_variants = struct
  type 'r not_record = [
    | `Name of string
    | `List of 'r (* any t is allowed *)
    | `Array of 'r not_record * int
  ]

  type 'r t_open = [
    | `Record of (string * 'r) list
    | 'r not_record
  ]

  type t = t t_open

  let _ : t = `List (`Record ["foo", `Array (`List (`Record []), 10)])

  (* Does not type-check:
  let _ : t = `List (`Record ["foo", `Array (`Record [], 10)]) *)
end

The disadvantage of the ADT solution is that you need the Not_record constructor, but smart constructors could help in this case.

The disadvantage of the polymorphic variant solution is that it only catches your invariant violation if the value is type-annotated.

3 Likes
#5

Not really, they could be used as a part of a solution which is more practical than GADT.

What’s wrong with GADT?

The problem with GADT is that they mix two orthogonal concepts - types and data representation (aka data types). While ADT define data representation, constraints, attached to GADT constructors, are pure types, as their only purpose is to constrain the set to which the type belongs. On one hand it is very nice, we can kill two rabbits at once, construct data and prove its correctness by construction. But this comes with a price because you have to repeat the proof every time you reconstruct your data. In other words, you can’t serialize and deserialize your GADT, as the constraint itself is not serializable (because it is not data).

Therefore, in real life, where you occasionally need to pass your AST around different hosts, store them in cache and files, etc, using GADT immediately becomes a nuisance. As every time when you need to read your AST from a string, you have to repeat the process of proving by construction. In other words, you need to write a typed parser, that does parsing (data processing) and typing simultaneously. This is notoriously hard and, in fact, is only possible in a small number of simple cases. E.g., it is practically1 impossible to reconstruct an arrow type from its string representation,

    type 'a app = 
      | Unit : unit app
      | App : 'a * 'b app -> ('a -> 'b) app

Separate the concerns

Therefore, a practical solution would be to separate the concerns back and use ADT for data types, and module types for types. In other words, we will hide our data representation behind an abstraction wall. And we may even use the private keyword to allow others to peek into our privates through the holes in our abstraction wall.

The approach basically mimics the GADT approach, except that we move the constraints into the module type,

module Types : sig = 
  type scalar
  type vector
  type data = private
    | Name of string
    | Record of (string * data) list
    | Array of (data * int)

  type 'a t = data
  val name : string -> scalar t
  val record : (string * _ t) list -> vector t
  val array : scalar t -> int -> scalar t
end = struct 
  type scalar
  type vector

  type data =
    | Name of string
    | Record of (string * data) list
    | Array of (data * int)

   type 'a t = data

   let name x = Name x
   let record elts = Record (elts)
   let array elt sz = Array (elt,sz)
end

Now, our invariant is protected, so that nobody outside of the module could create an incorrect tree. However, inside the module, behind the wall, we are free to do whatever we want, since the constraint is now a phantom type, the compiler will allow us to ascribe any type to any tree. Of course, it will allow us to ascribe wrong types, so it is now our responsibility to implement the type checker correctly. But now we can read our data from text and give it back the type constraint. We can even reify the type constraint into a concrete data type representation and store it with AST.

In this solution we use the private keyword to expose some of our abstraction. So that a user could use a regular pattern match, however this highlights a small problem. When we turned away from GADT we lost an important feature, as beyond proving the data type is constructed correctly, GADT enables us to use this proof when we deconstruct the data. With our plain ADT representation, which is now impossible to construct incorrectly (using the Types interface), we still have to deal with an extra case, when we do

function Array (elt,n) -> match elt with
  | Name s -> name s
  | Array (elt',m) -> array elt m
  | Record _ -> assert false

Since we don’t have the constraints anymore, the compiler can’t refute the Record branch (and indeed, an incorrect code inside of the Types module can construct such representation, so we can’t blame the compiler for that.

And although assert false is more or less fine here, we don’t want a casual user to rely on our internal invariants (an even know about them because an invariant, that leaks the module is no longer an invariant). The reason why the invarinat escaped is because we exposed our internal representation, we leaked it through the abstraction via the private keyword. Therefore, the solution is to hide it,

module Types : sig = 
  type scalar
  type vector
  type _ t 

  val name : string -> scalar t
  val record : (string * _ t) list -> vector t
  val array : scalar t -> int -> scalar t

  val case : _ t -> 
     scalar:(scalar t -> 'a)
     vector:(vector t -> 'a) -> 'a

  val match_scalar : scalar t -> 
      name:(string -> 'a) ->
      array:(scalar t -> int -> 'a) -> 'a

  val match_vector : vector t ->
     record:((string * _ t) list -> 'a) -> 'a

end = struct 
  type scalar
  type vector

  type data =
    | Name of string
    | Record of (string * data) list
    | Array of (data * int)

   type 'a t = data

   let name x = Name x
   let record elts = Record (elts)
   let array elt sz = Array (elt,sz)

   let case t ~scalar ~vector = match t with
     | Record _ -> vector t
     | Array _ | Name _ -> scalar t

   let match_scalar t ~name ~array  = match t with
    | Array (elt,sz) -> array elt sz
    | Name s -> name s
    | _ -> assert false

  let match_vector t ~record = match t with 
    | Record elts -> record elts
    | _ -> assert false
end

We confined our invariant inside our module, and we can use assert false which could be triggered only if we bugged our code inside of the Types modules, which is now considered the trusted kernel.

The users of our module now can use the whole power of constraints, and basically, use our typed AST as it was GADTs but without any pain associated with them.

Final Solution, Tagless Final

Finally, we can notice, that cases in our matches totally mimic the types of our branch constructors and this smells like a generalization opportunity. Indeed, we can abstract our abstraction into the module type

(* keep constraints not abstract, and make them provably distinct in
    case if someone would like to use GADT in their representations.  *)

type scalar = private Scalar_constraint
type vector = private Vector_constraint

module type Types = sig
  type _ t

  val name : string -> scalar t
  val record : (string * _ t) list -> vector t
  val array : scalar t -> int -> scalar t
end

(* our old Types module is just _a_ representation, let's call
   it an AST *)
module Ast : sig
  include Types

  val case : 'a t ->
    scalar : (scalar t -> 'a) ->
    vector : (vector t -> 'a) -> 'a

  module Analysis(L : Types) : sig
    val scalar : scalar t -> scalar L.t
    val vector : vector t -> vector L.t
  end
end = struct
  type data =
    | Name of string
    | Record of (string * data) list
    | Array of (data * int)

  type 'a t = data

  let name x = Name x
  let record elts = Record (elts)
  let array elt sz = Array (elt,sz)

  let case x ~scalar ~vector = match x with
    | Name _ | Array _ -> scalar x
    | Record _ -> vector x

  module Analysis(L : Types) = struct

    let rec scalar = function
      | Name s -> L.name s
      | Array (elt,sz) -> L.array (scalar elt) sz
      | Record _ -> assert false

    let rec vector = function
      | Name _ | Array _ -> assert false
      | Record ((_, (Name _ | Array _)) :: _ as elts) ->
        L.record @@
        List.map (fun (name,fld) -> name, scalar fld) elts
      | Record elts ->
        L.record @@
        List.map (fun (name,fld) -> name, vector fld) elts
  end
end

And now we can implement the analysis as simple as,


module Sizeof = Ast.Analysis(struct
    type 'a t = int
    let name _ = 4
    let array elt sz = elt * sz
    let record elts = List.fold_left (fun s (_,sz) -> s + sz) 0  elts
  end)

let sizeof t = Ast.case t
    ~scalar:Sizeof.scalar
    ~vector:Sizeof.vector

This approach is called Taggles Final Style and is well-explored.

A few grains of salt

Although it all may look nice (or vice verse), there is a grain of salt. It doesn’t work as we have a problem in our original design, we dropped the type constraint of the record constructor. In other words, our record constructor, (string * 'a t) -> scalar t is non-injective, as it projects different types into the same type. Therefore we can’t reconstruct the constraint back. Therefore, probably a better type system representation would be

type array = array
type 'a tuple = Tuple

type 'a t
val array : array t -> int -> array t
val tuple : 'a t -> 'b tuple t -> ('a -> 'b) tuple t
val unit : unit tuple t

and the case analysis will look something like this

val case : _ t ->
  unit : (() -> 'a) ->
  tuple : (('a -> 'b) tuple t -> 'a) ->
  array : (array t -> 'a) -> 'a
val split : ('a -> 'b) tuple t -> 'a t * 'b tuple t

1) You can do this for a fixed number of constraints, e.g., int -> unit, int -> string -> unit, etc, not for arbitrary constraint. You would need a backtracking parser, which will try on each branch all possible cases. As a result, not only the performance will suffer, but the complexity of the parser itself.

15 Likes