Playing with extensible type

Playing with extensible types

I’ve just found a nice trick related to extensible types (type t = ..) and I’m not sure whether it is known or not. As for me, I have not seen it.

So I will demo this trick with a toy example. An eval for an extremely simple AST.

I will start with the following type:

type t = ..
type t += 
	| Int of int

and define an eval function for it. Later I will add Add and Mul constructors to that type and redefine eval, but I will pattern-match only on added constructors and re-use the already defined evals.

And yes, we have to write the eval in a special way

let eval ?next t = match t with
  | Int n -> n
  | _  -> match next with
    | None -> failwith "."
   | Some f -> f t

Adding additions

type t +=
  | Add of t * t

let eval_v1 = eval
let rec eval ?next t =
  let next = Option.value next ~default:(eval ?next:None) in
  match t with
  | Add (e1, e2) -> eval ~next e1 + eval ~next e2
  | _ -> eval_v1 ~next t

Adding multiplications

type t +=
  | Mul of t * t

let eval_v2 = eval
let rec eval ?next t =
  let next = Option.value next ~default:(eval ?next:None) in
  match t with
  | Mul (e1, e2) -> eval ~next e1 * eval ~next e2
  | _ -> eval_v2 ~next t

And that’s it.

PS. Edited to add the missed case. Thanks @brandon !

2 Likes

This is neat! I like it.

It looks like one line was cut off, and the base eval ends up with a non-exhaustive pattern matching. Here it is with the missing case added back:

let eval ?next t = match t with
  | Int n -> n
  | _ -> match next with
    | None -> failwith "."
    | Some eval -> eval t

The chaining is more visibly immediate to me when the default value is given at the parameter. It didn’t click with me until I wrote it this way.

type t = ..
type t += Int of int

let eval ?(next = fun _ -> failwith ".") = function
  | Int n -> n
  | t -> next t

type t += Add of t * t

let eval_v1 = eval
let rec eval ?(next = eval ?next:None) = function
  | Add (e1, e2) -> eval ~next e1 + eval ~next e2
  | t -> eval_v1 ~next t
1 Like

You are right. One line was cut off during cut-and-paste.

And thanks for improving the base eval. It is certainly better this way.

Version 2 with proposed improvements:

type t = ..
type t += Int of int

let eval ?(next = fun _ -> failwith ".") = function
  | Int n -> n
  | t -> next t

type t += Add of t * t

let eval_v1 = eval
let rec eval ?(next = eval ?next:None) = function
  | Add (e1, e2) -> eval ~next e1 + eval ~next e2
  | t -> eval_v1 ~next t

type t += Mul of t * t

let eval_v2 = eval
let rec eval ?(next = eval ?next:None) = function
  | Mul (e1, e2) -> eval ~next e1 * eval ~next e2
  | t -> eval_v2 ~next t

In a real setting if you are going to bother using it, every eval will be defined in its own file/module and you will use SomeM.eval not eval_vN

Also some tests.

module Test = struct
  let _1, _2, _3, _4, _5 = Int 1, Int 2, Int 3, Int 4, Int 5

  let ( + ) e1 e2 = Add (e1, e2)
  let ( * ) e1 e2 = Mul (e1, e2)

  let _ = assert (_1 + _2 + _3 + _4 + _5 |> eval = 15)
  let _ = assert (_1 + _3 * _5 |> eval = 16)
  let _ = assert (_5 * _4 + _3 |> eval = 23)
  let _ = assert (_5 * (_4 + _3) |> eval = 35)
  let _ = print_endline "Tests passed."
end

Nice trick!
Here is an alternate implementation, based on ideas from the OOP world:

type t = ..
type t += Int of int

type eval_self =
  | Self of (eval_self -> t -> int)

let eval (Self _self) = function
  | Int n -> n
  | _ -> failwith "."

type t += Add of t * t

let eval (Self self) = function
  | Add (e1, e2) -> self (Self self) e1 + self (Self self) e2
  | t -> eval (Self self) t

type t += Mul of t * t

let eval (Self self) = function
  | Mul (e1, e2) -> self (Self self) e1 * self (Self self) e2
  | t -> eval (Self self) t

(* Final binding: close the loop *)
let eval = eval (Self eval)

Compared to your implementation, mine always goes back immediately to the final version in recursive cases, which makes it possible to reliably override earlier implementations (there isn’t much that you could override in this tiny example, but you could consider for instance forbidding negative numbers or adding overflow checks).
Both versions cannot reliably handle multiple parallel extensions though: if two modules extend the type independently, and do not expose the actual new constructors (introducing functions to build new values instead), you can create expressions that use both extensions but getting an evaluation function supporting both extensions is going to be hard.
With my proposal, one possibility would be to use a dedicated exception in the initial function and catch it in a wrapper:

exception Unknown_constructor

let eval (Self _self) = function
  | Int n -> n
  | _ -> raise Unknown_constructor

module M1 = struct (* ... *) end
module M2 = struct (* ... *) end

module M3 = struct
  let eval (Self self) t =
    try M1.eval (Self self) t with
    | Unknown_constructor -> M2.eval (Self self) t
end

This is not very satisfying though, as if M1 and M2 override existing constructors only M1’s overridings will be taken into account.

Yes, this is for example how cmarkit’s AST mapper and folder abstractions handle AST extensions (see the *_ext_default fields for handling the defaults folding of extensions, note this line is misindented).

In OCaml I would trace at least trace it to this 2000 paper by Jacques Garrigue which tries to tackle the famous expression problem (using polymorphic variants there).

Somebody mentioned the expression problem? Here is my take: curious-ocaml/chapter11/functional-lecture11.pdf at main · lukstafi/curious-ocaml · GitHub

1 Like

I totally agree that this scheme is not well suited to “diverging and converging” extension. So be it.

As for your approach I guess it can be simplified if we do not insist that everything is called eval

type t = ..

type t += Int of int
let evalInt _eval = function
  | Int n -> n
  | _ -> failwith "."

type t += Add of t * t
let evalAdd eval = function
  | Add (e1, e2) -> eval e1 + eval e2
  | t -> evalInt eval t

type t += Mul of t * t
let evalMul eval = function
  | Mul (e1, e2) -> eval e1 * eval e2
  | t -> evalAdd eval t

(* Tying the knot *)
let rec eval t = evalMul eval t

What matters here, I guess, is the “late-binding” of eval

Speaking of which I remember writing a recursive descent parser in that style, which allowed me avoid (explicit) recursion until the last step. Which turned out to be more pleasant than having a bunch let rec parse_a.. and parse_b ... and .. and parse_z

You can also do

type t = ..

type t += Int of int

let evalInt _eval = function
  | Int n -> Some n
  | _ -> None

type t += Add of t * t
let evalAdd eval = function
  | Add (e1, e2) -> Some (eval e1 + eval e2)
  | _ -> None

type t += Mul of t * t
let evalMul eval = function
  | Mul (e1, e2) -> Some (eval e1 * eval e2)
  | _ -> None

let rec eval t =
  match List.find_map (fun subeval -> subeval eval t) [evalInt;evalAdd;evalMul] with
  | Some v -> v
  | None -> failwith "."

or even

type t = ..

let subevals = ref []

let add_subeval f = subevals := f :: !subevals

let eval t =
  match List.find_map (fun subeval -> subeval t) !subevals with
  | Some v -> v
  | None -> failwith "."

type t += Int of int

let () = add_subeval @@ function
  | Int n -> Some n
  | _ -> None

type t += Add of t * t
let () = add_subeval @@ function
  | Add (e1, e2) -> Some (eval e1 + eval e2)
  | _ -> None

type t += Mul of t * t
let () = add_subeval @@ function
  | Mul (e1, e2) -> Some (eval e1 * eval e2)
  | _ -> None

Each “subeval” returns Some of the result if it knows how to handle the given term, otherwise returns None.

1 Like

There is also this related example in the OCaml manual