# The nested datatype zoo

Just like there exists a programming languages zoo, nested datatype pets merit their own mini-zoo.
I propose a simple minimal signature :

``````module type Datatype =
sig
type 'a t
val map : ('a -> 'b) -> 'a t -> 'b t
val size : 'a t -> int
type index
val member : index -> 'a t -> 'a
end
``````

The first mentioned nested datatype is` 'a Nest.t` :

``````(* a nested datatype from the paper "Nested Datatypes"
* by Richard Bird and Lambert Meertens 1998
* size & member are implemented by Chris Okasaki
*)
module Nest
=
struct
type 'a t =
| Nil
| Cons of 'a * ('a * 'a) t
let pair_map f (x,y) =
(f x,f y)
let rec map : 'a 'b . ('a -> 'b) -> 'a t -> 'b t =
fun f -> function
| Nil -> Nil
| Cons(h,t) -> Cons(f h,map (pair_map f) t)
let rec size : 'a . 'a t -> int = function
| Nil -> 0
| Cons(_,t) -> 1 + 2 * size t
type index =
int
let rec member : 'a . index -> 'a t -> 'a =
fun n -> function
| Nil -> failwith "Nest.member"
| Cons(h,t) ->
if n=0 then h else
let x,y = member (n/2) t in
if n mod 2 = 0 then x else y
end
``````

The next is the incomplete `'a Bush.t` :

``````(* another nested datatype from the paper "Nested Datatypes"
* by Richard Bird and Lambert Meertens 1998
* i don't see how to implement size & member
*)
module Bush
=
struct
type 'a t =
| Nil
| Cons of 'a * 'a t t
let rec map : 'a 'b . ('a -> 'b) -> 'a t -> 'b t =
fun f -> function
| Nil -> Nil
| Cons(h, t) -> Cons(f h,map (map f) t)
let rec size : 'a . 'a t -> int = function
| _ -> assert false
type index =
unit (* i don't even see what the index type should be *)
let rec member : 'a . index -> 'a t -> 'a = function
| _ -> assert false
end
``````

The last known to me is the incomplete `'a Lush.t` :

``````(* the last nested datatype known to me
* i don't see how to implement size without to_list
* i don't see how to fully implement member
*)
module Lush
=
struct
type 'a t =
| Item of 'a
| List of 'a list t
let rec map : 'a 'b. ('a -> 'b) -> 'a t -> 'b t =
fun f -> function
| Item x -> Item (f x)
| List l -> List (map (List.map f) l)
let rec to_list : 'a. 'a t -> 'a list = function
| Item x -> [x]
| List l -> List.concat (to_list l);;
let size t =
List.length (to_list t)
type index =
int list
let member =
fun (i:index) t ->
match i,t with
| _,Item _ -> assert false
| [a],List(Item l) -> List.nth l a
| [a;b],List(List(Item l)) -> List.nth (List.nth l a) b
end
``````

Can you help me to complete the zoo with `size` & `member` functions and/or an innovative new nested datatype ?

2 Likes

There’s a few leaps of logic necessary to start implementing heterogeneous containers, so in the hopes of transferring some of that: here’s a walkthrough of the implementation (and underlying principles) of the `Bush` case.

When working with heterogeneous nested types, there’s often a need to introduce more structured auxilliary values to keep track of “where you are” in the value with respect to the root. As an example, consider counting the number of elements in the following bush:

``````Cons (1, Cons (Cons (2, Cons (Cons (3, Nil), Nil)), Nil))
``````

When we fold over this bush, we’ll first count the head `1`, then look down the right child. This sub-bush has type `int Bush.t Bush.t`, but how many elements does it contain? Well, it depends on perspective: it contains 1 `int Bush.t` and 2 `int`s – in the context of our overall problem, we care about the number of `int`s. This suggests that we’ll need to keep some value representing our nesting depth, to tell us how far to unwrap before we get to a leaf that we can count as an “element” of the original tree.

From here on, I’m going to change the bush type to use list syntax because nested `Cons ...` makes my eyes unhappy:

``````module Bush = struct
type 'a t = [] | ( :: ) of 'a * 'a t t
``````

We can store our nesting depth as a natural number, with type parameters that reflect the nesting depth being referred to by the value:

``````  (* Peano naturals, typed such that the [n]-th integer is ('z t ... t, 'z) with
[n] applications of [t]. *)
type (_, _) nat = Z : ('z, 'z) nat | S : ('a, 'z) nat -> ('a t, 'z) nat
``````

Now we stand a chance at implementing `size`, we just have to be careful to update our nesting depth appropriately as we traverse the tree. (In this case, I’m taking “nesting depth” to be the number of times the head value has been wrapped – or, equivalently, the number of intermediate nodes on the left spine.) Fortunately, the type system forces us into the correct solution almost everywhere.

``````  (* Count the number of [a]s in the given bush. *)
let size : type a. a t -> int =
(* [aux acc n t] counts the number of elements in [t] and adds it to [acc],
given that there are [n] intermediate nodes between [t] and its leftmost
leaf. *)
let rec aux : type b. int -> (b, a) nat -> b t -> int =
(* A [(b, a) nat] means we're counting [a]s in a bush of type [b]. *)
fun acc n t ->
match (n, t) with
| _, [] -> acc
| Z, _ :: xs ->
(* Value on the left is a leaf: count it, then recurse on the right.
(We'll have [S Z = 1] intermediate node between the right child and any
values on its left spine.) *)
aux (acc + 1) (S Z) xs
| S n, x :: xs ->
(* Count over the left subbush, decrementing size of the left spine ... *)
let acc = aux acc n x in
(* ... then over the right subbush, _incrementing_ size of the left spine. *)
aux acc (S (S n)) xs
in
(* At the root: no intermediate nodes between [t] and its leftmost leaf. *)
fun t -> aux 0 Z t
``````

The case of `member` is similar, except this time the user must supply us with the structured value to tell us the type of leaves that they expect. In this case, an “index” is really a finite path of `Left | Right` choices terminating in the value that we want. The nature of bushes is that each path we choose fixes the relationship between the type of the root bush and the type of the value being pointed to: each time we go `Left`, we lose a layer of nesting; each time we go `Right`, we gain one. We can reflect this using a pair of type parameters:

``````  (* An [('a, 'z)] index is a path from the root of a bush of type ['a] to a value of type ['z]. *)
type (_, _) index =
| Z : ('z, 'z) index (* [Z] refers to the head element of a bush (c.f. index 0 of a list). *)
| L : ('a, 'z) index -> ('a t, 'z) index (* [L path] means "Go left, then follow [path]" *)
| R : ('a t, 'z) index -> ('a, 'z) index (* [R path] means "Go right, then follow [path]" *)
``````

The user will give us one of these paths, and we’ll just attempt to follow it:

``````  let rec member : type a z. (a, z) index -> a t -> z =
fun i t ->
match (i, t) with
| _, [] -> raise Not_found
| Z, x :: _ -> x (* We've arrived! Return the head of the bush. *)
| L n, xs :: _ -> member n xs (* Move left *)
| R n, _ :: ys -> member n ys (* Move right *)
end
``````

Some test cases for the above:

``````let () =
let pr fmt = Format.kasprintf print_endline fmt in
let t1 =
(* Example taken from "Nested Datatypes", Bird & Meertens. *)
Bush.
[
4;
[ 8; [ 5 ]; [ [ 3 ] ] ];
[ [ 7 ]; []; [ [ [ 7 ] ] ] ];
[ [ []; [ [ 0 ] ] ] ];
]
in
pr "size: %d" (Bush.size t1) (* = 7 *);
pr "L: %d" (Bush.member Z t1 (* = 4 *));
pr "L: %d" (Bush.member (R (L Z)) t1) (* = 8 *);
pr "L: %d" (Bush.member (R (L (R (L Z)))) t1) (* = 5 *);
pr "L: %d" (Bush.member (R (R (R (L (L (L Z)))))) t1) (* = Not_found *)
``````

I believe the `Lush.t` case is conceptually quite similar:

• when recursing, use auxilliary values to track nesting depth;
• when indexing, consume a well-typed path that relates the source type (the type of the root) to the destination type (the type of the value to be returned), and follow it in the obvious way.

You may be interested in Oleg’s `hlist` library, which applies these principles to the case of heterogeneous lists. (He’s referring to the structured indices as “lenses”, which calls out to the more general concept of “optics” in functional programming.)

I hope some of this helps!

3 Likes

@CraigFe
Of course it helps I will try to complete `Lush` without help, reading your answer until it’s crystal clear.
It’s harder than the Okasaki (Purely functional data structures) book content. No real surprise, i expected it to be.
I have some experience with indexed types, GADTs and Coq and this is not luxury, it looks more like a necessity.

@monoidoid
Your alternate proposition is very welcomed. It looks much more familiar to me (more programming, less logic). If i can do a `map` chances are i can also do a `fold`. The result may not be always optimally efficient i very much like how it is beginner-friendly. I will prioritize this way and see how far i can go. My intuition is that `member` requires a lazy catamorphism rather than the same `fold_right` catamorphism.

1 Like

I think it might be easier if you define a fold first and then use that for defining other functions. In your size example for Bush, what you could do is map `size` on `att`, then you have `int t`, and then you would like to sum all these `int`s. You already defined a map, but the last part requires a fold.

``````(* Bush *)
type 'a t =
| []
| ( :: ) of 'a * 'a t t

let rec fold_right : 'a. ('a -> 'b -> 'b) -> 'a t -> 'b -> 'b =
fun f at z -> match at with
| [] -> z
| a :: att -> f a (fold_right (fold_right f) att z)

let size at = fold_right (fun _ n -> n + 1) at 0

``````
2 Likes

@monoidoid

``````# size ([[];[[]]]);;
- : int = 2
``````

The correct answer should be `1`, isn’t it ?

`[[];[[]]]` has type `int t t t`. So `size` will count `int t t`s, and I see two of them: `[]` and `[]`.

1 Like

@monoidoid of course stupid me