Faking type families to encode extensible tagless-final interpreters

Hello,

I am trying to define a language in the style of the tagless-final interpreters of Carett, Kiselyov and Shan (my goal is to perform tagless-final abstract interpretation, as described in a forthcoming PLDI paper). I would like the language to be extensible: new operations, but also new types can be added.

This is an exemple of signature for such an extensible language.

type 'a typ = ..

type boolean = |
type 'a typ += Boolean: boolean typ
type 'a var_identifier = 'a typ * int

module type Lang_Base = sig
  type 'a value
  val true_: boolean value
  val not: boolean value -> boolean value

  type store 
  val var: 'a var_identifier  -> store -> 'a value
  val assign: 'a var_identifier -> 'a value -> store -> 'a value
end

type integer = |
type 'a typ += Integer: integer typ


module type Lang_With_Int = sig
  include Lang_Base
  val const: int -> integer value
  val add: integer value -> integer value -> integer value
  val leq: integer value -> integer value -> boolean value
end

How to implement this? One possibility would be to use GADTs:

module Lang_Base_Interp(* :Lang_Base *) = struct
  type !'a value = ..
  type 'a value +=
    | BoolVal: bool -> boolean value 
  let true_ = BoolVal true
  let not ((BoolVal x):boolean value) = (BoolVal (not x))

  type store = ..             (* A polymorphic map *)
  let var _ = assert false
  let assign _ = assert false
end

module Lang_With_Int_Interp:Lang_With_Int = struct
  include Lang_Base_Interp
  type 'a value +=
    | IntVal: int -> integer value

  let const i = IntVal i
  let add (IntVal a) (IntVal b) = (IntVal (a + b))
  let leq (IntVal a) (IntVal b) = (BoolVal (a <= b))
end

There are several problems here: we need to insert BoolVal and IntVal conversions every where, all values are boxed, we have non-exhaustive pattern matching errors everywhere (and the code does dynamic checks).

An alternative solution based on equality witnesses, is this one:

type ('a,'b) eq = Eq: ('a,'a) eq

let cast (type a) (type b) (Eq: (a,b) eq) (a:a) : b = a;;

module Lang_Base_Interp2(* :Lang_Base *) = struct
  type 'a value
  let to_bool: (boolean value,bool) eq = Obj.magic Eq
  let from_bool: (bool,boolean value) eq = Obj.magic Eq      

  let true_ = cast from_bool true
  let not x = cast from_bool @@ not @@ cast to_bool @@ x

  type store = unit             (* A polymorphic map *)
  let var _ = assert false
  let assign _ = assert false
end

module Lang_With_Int_Interp2:Lang_With_Int = struct
  include Lang_Base_Interp2
  let to_int: (integer value,int) eq = Obj.magic Eq
  let from_int: (int,integer value) eq = Obj.magic Eq      

  let const i = cast from_int i
  let add a b = cast from_int @@ (cast to_int a) + (cast to_int b)
  let leq a b = cast from_bool ((cast to_int a) <= (cast to_int b))

end

This is a manual encoding of an extensible type family constructor (I think, in Haskell, this could be encoded directly as

   data family Value a
   data instance Value Integer = Int
   data instance Value Boolean = Bool

It solves some of the problems: the values are now unboxed, and the code no longer performs any dynamic check and the generated code is as performant as can be. There are still two problems though: the use of Obj.magic for the equality proof, and the need to use cast operations. Ideally we would just add a “constraint boolean value = bool constraint integer value = int”, but I don’t think that we can do this (even with Obj).

My question is the following: is there a better way to implement this use case?

The paper “Unboxing Mutually Recursive Type Definitions in OCaml” ( [1811.02300] Unboxing Mutually Recursive Type Definitions in OCaml ) by Colin, Lepigre and Scherer mentions a possible solution (unboxing GADT with several constructors, although this would not solve the pattern-matching error problem due to extensibility) and they conclude by “It seems fishy to only allow pattern-matching on partially-determined instances of the type, and we did not investigate further.” Maybe the above provides a compelling use case for such a feature?

Is there a strong reason for integer and boolean type tags to be opaque? If you may lift the equality witnesses into real type equalities, there is a trivial implementation.

type boolean = bool
type integer = int

...

module Lang_Base_Interp2 : Lang_Base with type 'a value = 'a = struct
  type 'a value = 'a

  let true_ = true
  let not = not

  type store = unit             (* A polymorphic map *)
  let var _ = assert false
  let assign _ = assert false
end

module Lang_With_Int_Interp2:Lang_With_Int = struct
  include Lang_Base_Interp2

  let const i = i
  let add = ( + )
  let leq = ( <= )
end
1 Like

Is there a strong reason for integer and boolean type tags to be opaque?

Yes: we can have multiple interpretations of the integer and boolean types. For instance in a simple abstract interpreter, integer would represent intervals.

One solution the vein that what you propose is then to add the types boolean and integer to the signature:

module type Lang_Base = sig
  type 'a value
  type boolean
  val true_: boolean value
  val not: boolean value -> boolean value

  type store 
  val var: 'a var_identifier  -> store -> 'a value
  val assign: 'a var_identifier -> 'a value -> store -> 'a value
end

This works well when we have only monomorphic types. The problem is that polymorphic identifiers like 'a typ become more complicated, as they now need to be parameterized by the types. Naively, we could do:

type ('a,'boolean,'integer) typ =
  | Boolean: ('boolean,'boolean,'integer) typ
  | Integer: ('integer,'boolean,'integer) typ            

But besides the large type signatures, it cannot be extended at all, as the number of type parameters depends on the number of types that we will need.

I found a workaround for this problem: it seems that some type-level functions can be implemented using OCaml’s type system, namely projection on type.

type ('a,'types) typ = ..
type 'a proj_boolean = 'b constraint 'a = <boolean:'b;..>
type ('a,types) typ += Boolean: ('types proj_boolean, 'types) typ

type 'a proj_integer = 'b constraint 'a = <integer:'b;..>
type ('a,types) typ += Integer: ('types proj_integer, 'types) typ

The full example now becomes:

type ('a,'types) typ = ..
type ('a,'types) var_identifier = ('a,'types) typ * int
                                  
type 'a proj_boolean = 'b constraint 'a = <boolean:'b;..>
type ('a,'types) typ += Boolean: ('types proj_boolean, 'types) typ

module type Lang_Base = sig
  type 'a value
  type boolean
  type types = <boolean:boolean>
  val true_: types proj_boolean value
  val not: boolean value -> boolean value

  type store 
  val var: ('a,types) var_identifier  -> store -> 'a value
  val assign: ('a,types) var_identifier -> 'a value -> store -> 'a value
end

type 'a proj_integer = 'b constraint 'a = <integer:'b;..>
type ('a,'types) typ += Integer: ('types proj_integer, 'types) typ

module type Lang_With_Int = sig
  include Lang_Base
  type integer
  type types = <boolean:boolean;integer:integer>
  val const: int -> types proj_integer value
  val add: types proj_integer value -> types proj_integer value -> types proj_integer value
  val leq: types proj_integer value -> types proj_integer value -> boolean value
end

module Lang_Base_Interp(* :Lang_Base *) = struct
  type boolean = bool
  type 'a value = 'a
  type types = <boolean:boolean>
  let true_ = true
  let not x = not x

  type store = unit             (* A polymorphic map *)
  let var _ = assert false
  let assign _ = assert false
end

module Lang_With_Int_Interp:Lang_With_Int = struct
  include Lang_Base_Interp
  type integer = int
  type types = <boolean:boolean;integer:integer>
  let const i = i
  let add a b = a + b
  let leq a b = a <= b
end

It seems to work but I remember that I had issues going in that direction (besides the signatures being rather heavy), in particular because values like Integer are now polymorphic. I will check if I can find them.

A minor issue is also that we have to repeat the list of types that we include when we redefine types, but maybe there is a solution to this problem (I am not very familiar with OCaml’s row types).