Heterogeneous lists (difflist) puzzle: functions versus pairs

Consider the following definition of heterogeneous lists:

type 'a t =
| ( :: ) : 'hd * 'tl t -> ('hd * 'tl) t
| [] : unit t

We get the following types:

let x : (int * (string * unit)) t = [ 1; "one" ]

This can get extremely ugly with very large lists

Ugly real life example from our codebase
    type writable_columns =
      Timmy.Time.t Crud.Writable_column.optional
      * (Timmy.Week.t Crud.Writable_column.optional
        * (Timmy.Time.t Crud.Writable_column.optional
          * (Routine_schemas.Integration.Id.t Crud.Writable_column.optional
            * (string Crud.Writable_column.optional
              * (Schematic.Json.t Crud.Writable_column.optional
                * (Routine_schemas.Id.t Crud.Writable_column.optional
                  * (Timmy.Date.t Crud.Writable_column.optional
                    * (bool Crud.Writable_column.required
                      * (Routine_schemas.Task_recurrent.Id.t
                        * (Timmy.Date.t Crud.Writable_column.optional
                          * (string Crud.Writable_column.required
                            * (Uri.t Crud.Writable_column.optional * unit))))))))))))

Since the parameter of our GADT type is mute, we can make this much better by replacing 'hd * 'tl with 'hd -> 'tl; since -> is right-associative, parentheses become useless and indentation much better:

Cleaner version
type writable_columns =
  Timmy.Time.t Crud.Writable_column.optional ->
  Timmy.Week.t Crud.Writable_column.optional ->
  Timmy.Time.t Crud.Writable_column.optional ->
  Routine_schemas.Integration.Id.t Crud.Writable_column.optional ->
  string Crud.Writable_column.optional ->
  Schematic.Json.t Crud.Writable_column.optional ->
  Routine_schemas.Id.t Crud.Writable_column.optional ->
  Timmy.Date.t Crud.Writable_column.optional ->
  bool Crud.Writable_column.required ->
  Routine_schemas.Task_recurrent.Id.t Crud.Writable_column.optional ->
  Timmy.Date.t Crud.Writable_column.optional ->
  string Crud.Writable_column.required ->
  Uri.t Crud.Writable_column.optional ->

This works, however, as long as this type is indeed mute. We need to be able to convert these lists to nested pairs, for instance to feed them to caqti which understandably consumes nested pairs. With 'hd * 'tl this is straightforwad:

let rec to_pairs : type a. a t -> a =
  | a :: b -> (a, to_pairs next b)
  | [] -> ()

let _ : (int * (string * unit)) = to_pairs [ 1; "one" ]

With 'hd -> 'tl however, it seems much hairier to me. The best solution I could come up with introduces a witness type that essentially proves to the typer that both lists have the same length:

type (_, _) convert =
  | Cons : ('b, 'c) convert -> ('a -> 'b, 'a * 'c) convert
  | Null : (unit, unit) convert

let rec to_pairs : type a b. (a, b) convert -> a t -> b =
 fun convert l ->
  match (convert, l) with
  | Cons next, a :: b -> (a, to_pairs next b)
  | Null, [] -> ()

This however makes conversion to pairs more complex and quite redundant:

let _ : (int * (string * unit)) = to_pairs (Cons (Cons Null)) [ 1; "one" ]

This is hardly satisfying to me as the syntax issue was just shifted to to_pairs, on top of having a runtime cost for something that is purely static. I’m under the impression there is no sound way to type this, which is quite frustrating since it looks purely syntactical.

One could argue that the simplest solution would be to just bite the bullets and accept this 'hd * 'tl nesting and not throw a fit for pure lexical reasons, but there are also cases where I would like to derive actual functions (à la Printf) from such lists, and thus have this problem the other way around.

Any insight appreciated !

1 Like

I guess what you can do is convert an heterogeneous list into a higher-order function that takes an input function, and calls it on the elements of the list (the client function thus takes as many arguments as the size of the list):

type 'a t =
| ( :: ) : 'hd * 'tl t -> ('hd -> 'tl) t
| [] : unit t

let rec tofun : type a. a t -> a -> unit =
  | x :: xs -> fun f -> tofun xs (f x)
  | [] -> fun f -> ()

let x : _ t = [ 1; "one" ]
let () = tofun x (fun x y -> print_int x; print_string y)

In that case the return type of the client function is fixed to unit so you’d have to smuggle out the values of the list by doing side effects. It seems that you can have a polymorphic return type by changing the definition of t:

type ('a, 'b) t =
| ( :: ) : 'hd * ('tl, 'b) t -> ('hd -> 'tl, 'b) t
| [] : ('b, 'b) t

let rec tofun : type a b. (a, b) t -> a -> b =
  | x :: xs -> fun f -> tofun xs (f x)
  | [] -> fun f -> f

let x : _ t = [ 1; "one" ]
let _ (* 4 *) = tofun x (fun x y -> x + String.length y)
let _ (* "1one" *) = tofun x (fun x y -> string_of_int x ^ y)

But indeed that’s still not producing nested pairs from the heterogeneous list.

Yeah you need some witness to convert from arrows to products and vice-versa… Have you considered having both in your list type? :-°

type (_, _) t =
  | ( :: ) : 'hd * ('tl_arr, 'tl_prod) t -> ('hd -> 'tl_arr, 'hd * 'tl_prod) t
  | [] : (unit, unit) t

type 'a pretty = Pretty : ('a, _) t -> 'a pretty
type 'a ugly   = Ugly   : (_, 'a) t -> 'a ugly

I’m indeed familiar with these tricks, but unfortunately as you mentionned it still does not provide me with my nested pairs :'-)

The solution mentionned by @art-w , which is having BOTH types, does indeed enable both usages:

type ('f, 'pairs) t = 
| ( :: ) : ('a * 'b t) -> ('a -> 'b, 'a * 'b) t
| [] : (unit, unit) t

Except my initial issue was the struggle in typing the nested pairs type and I would now have to type both the functional and nested pairs, making it even worse.

I suppose there is no easy solution to my problem for now, it would require some fancy way to translate types:

(* Pseudo code *)
type 'l pairs = 
  match 'l with
  | 'hd -> 'tl -> ('hd * 'tl pairs)
  | 'null -> 'null

But I suppose this is a bit too convoluted, if only feasible/decidable at all.

Thanks for the insights!

Indexed Type Families, what you wrote in pseudo OCaml here, can be made decidable as far as I know.
And I believe art-w’s answer can be made very usable, assuming you want this to be able to write types by hand.

let xs : (int -> string -> unit, _) t = [1; "one"]
let rec to_pairs : type a x. (x, a) t -> a = function
  | a :: b -> (a, to_pairs b)
  | [] -> ()
let ys = to_pairs xs

Although you can’t use the wildcard inference trick in interface files or module sigs.
You could even make it so Pretty/Ugly is zero-cost with [@@unboxed]. I don’t know how useful those are, considering there’s no way I can think of for you to recover the (existential) half they hide. Maybe they’re nice for printing.the type of a hlist defined that way interactively.

1 Like

Unfortunately my issue is that I need to write-out these types in module signatures as they are consumed by functors.

If you need to lift some complex type computed by a GADT at the term-level to the module level, you can use a first-class module to bridge the gap:

module type t = sig type t end
type 'a ty = (module t with type t = 'a)
let type_of (type x) (x:x): x ty = (module struct type t = x end)

type z = private Z type !'a s = private S of 'a
type 'a nat = Z: unit nat | S: 'a nat -> 'a s nat

module T = (val type_of (S (S Z))) 
type u = T.t