Hi,
I am here to share some though about the type system of sum types in OCaml.
Disclaimer: I have not followed the language development for a while so, sorry if something similar have already been discussed elsewhere.
Context: I recently opened the Pandora’s box by playing with polymorphic variants and I was stunned by their capacities and how easy it is to restrict or extend a type using them. Yet, they have some drawback and they suffer from a bad reputation so for now, I can not expect to use them in my collaborative projects without annoying the other people. So, I am looking forward for some trade-off between the 3 ways for declaring sum types (ok, it is actually 2.5 since GADTs are parts of normal variants) and maybe, some cool new features.
Motivating example
Here is an oversimplified example to illustrate the main features I wish to be present in OCaml.
module Expr : sig
type t = Variable of string
| Plus of t list
val pp: Format.formatter -> t -> unit
end
module Equation : sig
type t = Equal of Expr.t * Expr.t
| Diff of Expr.t * Expr.t
end
module System : sig
type t = Equation.t list
val parse : string -> t
val empty: t
val add : Equation.t -> t -> t
val solve : t -> (string * int) list
end
It is not rocket science, I think it is a basic usage of variants.
Here are additional (arbitrary) constrains:
- the types should be finite to rely on exhaustive pattern matching;
- the types already exist and can not be easily modified;
- GADTs scare some people.
Sub-typing
Let say that some sequences of variants make no sense or are not optimal in some context.
I want to implement some syntactic simplifications and for instance, the expression Plus [ Variable "a"; Plus [ Variable "b"; Variable "c" ] ]
can be flattened: Plus [ Variable "a"; Variable "b"; Variable "c" ]
.
I want to go further such as there is an invariant that enforces all the linear addition to be flattened.
module Var : sig
type t = subtype of Expr.t = Variable of string
end
module NormExpr : sig
type t = subtype of Expr.t = Variable of string | Plus of Var.t list
end
The definition is valid because the compiler can prove that subtype always contains less variant than the type they are based on.
Now, when I will pattern match on a NormExpr
, the compiler will warn me if I try to pattern match a non flattened addition. Still, I can cast NormExpr
to Expr
to reuse any function defined for the latter.
Note: Here it is trivial but enforcing invariant in the AST of an normalized expression can be very valuable for complex types.
Current status:
- standard variant: impossible
- polymorphic variant: possible but runtime will be slower
- GADT: I know a way to encode such invariant with phantom type but it is pretty ugly for complex expressions and force to declare a lot of universally quantified (unboxed) single variant types.
Super-typing
For all the existing code, it makes no sense to have an additional variant, but for a very specific usage, I would like to extend the type of Equation
.
module ExtEquation : sig
type t = supertype of Equation.t
= Equal of Expr.t * Expr.t
| Diff of Expr.t * Expr.t
| Interval of { lo: int; e: Expr.t; hi: int }
end
module ExtSystem : sig
type t = ExtEquation.t list
val add : ExtEquation.t -> t -> t
val solve : t -> (string * int) list
end
Here, the point is that ExtEquation
accept every variants of Equation
so, ExtSystem
is also a supertype of System
. It is then possible to solve a System
using function from ExtSystem
without having to recreate it.
Current status:
- polymorphic variant: possible but runtime will be slower
- standard variant & GADT: without modifying the base type, it is possible to create a new type like:
type t = Basic of Equation.t
| Interval of { lo: int; e: Expr.t; hi: int }
This approach has three drawbacks:
- it is less memory efficient (especially if you want to extend the type multiple times)
- you have to make an explicit converted
val ext: Equation.t -> ExtEquation.t
- it forces you to “recreate” every structure that contains an
Equation
. Here, it is a simple list but ifSystem
was a more complex structure (eg. graph) it could have a high cost.
Mini conclusion
The two features are more or less already implemented by polymorphic variant. My concern about them is they are too general for what I want: I do not need arbitrary variants but just more flexibility over a set of well defined variants.
I can imagine two ways to improve this:
- add an optimization pass that will make polymorphic variants to use the same representation as standard variant if the set is finite and well know at compile time;
- add more flexibility to standard variants with a mechanism of
subtype of
andsupertype of
Regards,
Frédéric Recoules