Sum types: sub- & super-type


I am here to share some though about the type system of sum types in OCaml.

Disclaimer: I have not followed the language development for a while so, sorry if something similar have already been discussed elsewhere.

Context: I recently opened the Pandora’s box by playing with polymorphic variants and I was stunned by their capacities and how easy it is to restrict or extend a type using them. Yet, they have some drawback and they suffer from a bad reputation so for now, I can not expect to use them in my collaborative projects without annoying the other people. So, I am looking forward for some trade-off between the 3 ways for declaring sum types (ok, it is actually 2.5 since GADTs are parts of normal variants) and maybe, some cool new features.

Motivating example
Here is an oversimplified example to illustrate the main features I wish to be present in OCaml.

module Expr : sig
  type t = Variable of string
         | Plus     of t list
  val pp: Format.formatter -> t -> unit

module Equation : sig
  type t = Equal of Expr.t * Expr.t
         | Diff  of Expr.t * Expr.t

module System : sig
  type t = Equation.t list

  val parse : string -> t
  val empty: t
  val add : Equation.t -> t -> t
  val solve : t -> (string * int) list

It is not rocket science, I think it is a basic usage of variants.
Here are additional (arbitrary) constrains:

  • the types should be finite to rely on exhaustive pattern matching;
  • the types already exist and can not be easily modified;
  • GADTs scare some people.

Let say that some sequences of variants make no sense or are not optimal in some context.

I want to implement some syntactic simplifications and for instance, the expression Plus [ Variable "a"; Plus [ Variable "b"; Variable "c" ] ] can be flattened: Plus [ Variable "a"; Variable "b"; Variable "c" ].

I want to go further such as there is an invariant that enforces all the linear addition to be flattened.

module Var : sig
   type t = subtype of Expr.t = Variable of string

module NormExpr : sig
  type t = subtype of Expr.t = Variable of string | Plus of Var.t list

The definition is valid because the compiler can prove that subtype always contains less variant than the type they are based on.

Now, when I will pattern match on a NormExpr, the compiler will warn me if I try to pattern match a non flattened addition. Still, I can cast NormExpr to Expr to reuse any function defined for the latter.

Note: Here it is trivial but enforcing invariant in the AST of an normalized expression can be very valuable for complex types.

Current status:

  • standard variant: impossible
  • polymorphic variant: possible but runtime will be slower
  • GADT: I know a way to encode such invariant with phantom type but it is pretty ugly for complex expressions and force to declare a lot of universally quantified (unboxed) single variant types.

For all the existing code, it makes no sense to have an additional variant, but for a very specific usage, I would like to extend the type of Equation.

module ExtEquation : sig
  type t = supertype of Equation.t
         = Equal    of Expr.t * Expr.t
         | Diff     of Expr.t * Expr.t
         | Interval of { lo: int; e: Expr.t; hi: int }

module ExtSystem : sig
  type t = ExtEquation.t list

  val add : ExtEquation.t -> t -> t
  val solve : t -> (string * int) list

Here, the point is that ExtEquation accept every variants of Equation so, ExtSystem is also a supertype of System. It is then possible to solve a System using function from ExtSystem without having to recreate it.

Current status:

  • polymorphic variant: possible but runtime will be slower
  • standard variant & GADT: without modifying the base type, it is possible to create a new type like:
type t = Basic of Equation.t
       | Interval of { lo: int; e: Expr.t; hi: int }

This approach has three drawbacks:

  • it is less memory efficient (especially if you want to extend the type multiple times)
  • you have to make an explicit converted val ext: Equation.t -> ExtEquation.t
  • it forces you to “recreate” every structure that contains an Equation. Here, it is a simple list but if System was a more complex structure (eg. graph) it could have a high cost.

Mini conclusion
The two features are more or less already implemented by polymorphic variant. My concern about them is they are too general for what I want: I do not need arbitrary variants but just more flexibility over a set of well defined variants.

I can imagine two ways to improve this:

  • add an optimization pass that will make polymorphic variants to use the same representation as standard variant if the set is finite and well know at compile time;
  • add more flexibility to standard variants with a mechanism of subtype of and supertype of

Frédéric Recoules