The nested datatype zoo

Just like there exists a programming languages zoo, nested datatype pets merit their own mini-zoo.
I propose a simple minimal signature :

module type Datatype =
sig
   type 'a t
   val map : ('a -> 'b) -> 'a t -> 'b t
   val size : 'a t -> int 
   type index
   val member : index -> 'a t -> 'a
end

The first mentioned nested datatype is 'a Nest.t :

(* a nested datatype from the paper "Nested Datatypes"
 * by Richard Bird and Lambert Meertens 1998
 * size & member are implemented by Chris Okasaki
 *)
module Nest
=
struct
   type 'a t =
      | Nil
      | Cons of 'a * ('a * 'a) t
   let pair_map f (x,y) =
      (f x,f y)
   let rec map : 'a 'b . ('a -> 'b) -> 'a t -> 'b t = 
      fun f -> function
      | Nil -> Nil
      | Cons(h,t) -> Cons(f h,map (pair_map f) t)
   let rec size : 'a . 'a t -> int = function
      | Nil -> 0
      | Cons(_,t) -> 1 + 2 * size t
   type index =
      int
   let rec member : 'a . index -> 'a t -> 'a =
      fun n -> function
      | Nil -> failwith "Nest.member"
      | Cons(h,t) ->
            if n=0 then h else
            let x,y = member (n/2) t in
            if n mod 2 = 0 then x else y
end

The next is the incomplete 'a Bush.t :

(* another nested datatype from the paper "Nested Datatypes"
 * by Richard Bird and Lambert Meertens 1998
 * i don't see how to implement size & member
 *)
module Bush
=
struct
   type 'a t =
      | Nil
      | Cons of 'a * 'a t t
   let rec map : 'a 'b . ('a -> 'b) -> 'a t -> 'b t = 
      fun f -> function
      | Nil -> Nil
      | Cons(h, t) -> Cons(f h,map (map f) t)
   let rec size : 'a . 'a t -> int = function 
      | _ -> assert false
   type index =
      unit (* i don't even see what the index type should be *)
   let rec member : 'a . index -> 'a t -> 'a = function
      | _ -> assert false
end

The last known to me is the incomplete 'a Lush.t :

(* the last nested datatype known to me 
 * i don't see how to implement size without to_list
 * i don't see how to fully implement member
 *) 
module Lush
=
struct
   type 'a t =
      | Item of 'a
      | List of 'a list t
   let rec map : 'a 'b. ('a -> 'b) -> 'a t -> 'b t =
      fun f -> function
      | Item x -> Item (f x)
      | List l -> List (map (List.map f) l)
   let rec to_list : 'a. 'a t -> 'a list = function
      | Item x -> [x]
      | List l -> List.concat (to_list l);;
   let size t =
      List.length (to_list t)
   type index =
      int list
   let member =
      fun (i:index) t ->
      match i,t with
      | _,Item _ -> assert false
      | [a],List(Item l) -> List.nth l a
      | [a;b],List(List(Item l)) -> List.nth (List.nth l a) b
end

Can you help me to complete the zoo with size & member functions and/or an innovative new nested datatype ?
Thank for your effort.

2 Likes

There’s a few leaps of logic necessary to start implementing heterogeneous containers, so in the hopes of transferring some of that: here’s a walkthrough of the implementation (and underlying principles) of the Bush case.

When working with heterogeneous nested types, there’s often a need to introduce more structured auxilliary values to keep track of “where you are” in the value with respect to the root. As an example, consider counting the number of elements in the following bush:

Cons (1, Cons (Cons (2, Cons (Cons (3, Nil), Nil)), Nil))

When we fold over this bush, we’ll first count the head 1, then look down the right child. This sub-bush has type int Bush.t Bush.t, but how many elements does it contain? Well, it depends on perspective: it contains 1 int Bush.t and 2 ints – in the context of our overall problem, we care about the number of ints. This suggests that we’ll need to keep some value representing our nesting depth, to tell us how far to unwrap before we get to a leaf that we can count as an “element” of the original tree.

From here on, I’m going to change the bush type to use list syntax because nested Cons ... makes my eyes unhappy:

module Bush = struct
  type 'a t = [] | ( :: ) of 'a * 'a t t

We can store our nesting depth as a natural number, with type parameters that reflect the nesting depth being referred to by the value:

  (* Peano naturals, typed such that the [n]-th integer is ('z t ... t, 'z) with
     [n] applications of [t]. *)
  type (_, _) nat = Z : ('z, 'z) nat | S : ('a, 'z) nat -> ('a t, 'z) nat

Now we stand a chance at implementing size, we just have to be careful to update our nesting depth appropriately as we traverse the tree. (In this case, I’m taking “nesting depth” to be the number of times the head value has been wrapped – or, equivalently, the number of intermediate nodes on the left spine.) Fortunately, the type system forces us into the correct solution almost everywhere.

  (* Count the number of [a]s in the given bush. *)
  let size : type a. a t -> int =
    (* [aux acc n t] counts the number of elements in [t] and adds it to [acc],
       given that there are [n] intermediate nodes between [t] and its leftmost
       leaf. *)
    let rec aux : type b. int -> (b, a) nat -> b t -> int =
     (* A [(b, a) nat] means we're counting [a]s in a bush of type [b]. *)
     fun acc n t ->
      match (n, t) with
      | _, [] -> acc
      | Z, _ :: xs ->
          (* Value on the left is a leaf: count it, then recurse on the right.
             (We'll have [S Z = 1] intermediate node between the right child and any
             values on its left spine.) *)
          aux (acc + 1) (S Z) xs
      | S n, x :: xs ->
          (* Count over the left subbush, decrementing size of the left spine ... *)
          let acc = aux acc n x in
          (* ... then over the right subbush, _incrementing_ size of the left spine. *)
          aux acc (S (S n)) xs
    in
    (* At the root: no intermediate nodes between [t] and its leftmost leaf. *)
    fun t -> aux 0 Z t

The case of member is similar, except this time the user must supply us with the structured value to tell us the type of leaves that they expect. In this case, an “index” is really a finite path of Left | Right choices terminating in the value that we want. The nature of bushes is that each path we choose fixes the relationship between the type of the root bush and the type of the value being pointed to: each time we go Left, we lose a layer of nesting; each time we go Right, we gain one. We can reflect this using a pair of type parameters:

  (* An [('a, 'z)] index is a path from the root of a bush of type ['a] to a value of type ['z]. *)
  type (_, _) index =
    | Z : ('z, 'z) index (* [Z] refers to the head element of a bush (c.f. index 0 of a list). *)
    | L : ('a, 'z) index -> ('a t, 'z) index (* [L path] means "Go left, then follow [path]" *)
    | R : ('a t, 'z) index -> ('a, 'z) index (* [R path] means "Go right, then follow [path]" *)

The user will give us one of these paths, and we’ll just attempt to follow it:

  let rec member : type a z. (a, z) index -> a t -> z =
   fun i t ->
    match (i, t) with
    | _, [] -> raise Not_found
    | Z, x :: _ -> x (* We've arrived! Return the head of the bush. *)
    | L n, xs :: _ -> member n xs (* Move left *)
    | R n, _ :: ys -> member n ys (* Move right *)
end

Some test cases for the above:

let () =
  let pr fmt = Format.kasprintf print_endline fmt in
  let t1 =
    (* Example taken from "Nested Datatypes", Bird & Meertens. *)
    Bush.
      [
        4;
        [ 8; [ 5 ]; [ [ 3 ] ] ];
        [ [ 7 ]; []; [ [ [ 7 ] ] ] ];
        [ [ []; [ [ 0 ] ] ] ];
      ]
  in
  pr "size: %d" (Bush.size t1) (* = 7 *);
  pr "L: %d" (Bush.member Z t1 (* = 4 *));
  pr "L: %d" (Bush.member (R (L Z)) t1) (* = 8 *);
  pr "L: %d" (Bush.member (R (L (R (L Z)))) t1) (* = 5 *);
  pr "L: %d" (Bush.member (R (R (R (L (L (L Z)))))) t1) (* = Not_found *)

I believe the Lush.t case is conceptually quite similar:

  • when recursing, use auxilliary values to track nesting depth;
  • when indexing, consume a well-typed path that relates the source type (the type of the root) to the destination type (the type of the value to be returned), and follow it in the obvious way.

You may be interested in Oleg’s hlist library, which applies these principles to the case of heterogeneous lists. (He’s referring to the structured indices as “lenses”, which calls out to the more general concept of “optics” in functional programming.)

I hope some of this helps!

3 Likes

@CraigFe
Of course it helps :slight_smile:
I will try to complete Lush without help, reading your answer until it’s crystal clear.
It’s harder than the Okasaki (Purely functional data structures) book content. No real surprise, i expected it to be.
I have some experience with indexed types, GADTs and Coq and this is not luxury, it looks more like a necessity.

@monoidoid
Your alternate proposition is very welcomed. It looks much more familiar to me (more programming, less logic). If i can do a map chances are i can also do a fold. The result may not be always optimally efficient i very much like how it is beginner-friendly. I will prioritize this way and see how far i can go. My intuition is that member requires a lazy catamorphism rather than the same fold_right catamorphism.

1 Like

I think it might be easier if you define a fold first and then use that for defining other functions. In your size example for Bush, what you could do is map size on att, then you have int t, and then you would like to sum all these ints. You already defined a map, but the last part requires a fold.

(* Bush *)
type 'a t =
    | []
    | ( :: ) of 'a * 'a t t

  let rec fold_right : 'a. ('a -> 'b -> 'b) -> 'a t -> 'b -> 'b =
    fun f at z -> match at with
    | [] -> z
    | a :: att -> f a (fold_right (fold_right f) att z)

  let size at = fold_right (fun _ n -> n + 1) at 0

2 Likes

@monoidoid

# size ([[];[[[9]]]]);;
- : int = 2

The correct answer should be 1, isn’t it ?

[[];[[[9]]]] has type int t t t. So size will count int t ts, and I see two of them: [] and [[9]].

1 Like

@monoidoid of course stupid me