My impression is that polymorphic variants are often used in library interfaces to avoid having to use qualified names, and this always makes me feel funny—in the sense that it would be better to just use normal variants for the sake of keeping type errors simple for my limited brainspace.
Is there a guideline for when it’s really better to use polymorphic variants?
A typical usecase is when you need many slight variations of a sum type and you want to avoid duplicating the whole sum type for each one (this is one of the original motivations, found in LablTk GitHub - garrigue/labltk: LablTk, an OCaml interface for Tcl/Tk).
My own experience is to start with polymorphic variants first, especially if the data type is simple. For example, many functions need to return some result that may require just a little more than an option type - perhaps 3 or 4 possible states. It’s much easier not to have to define this type.
Later on, if the type expands and becomes complex, or if it’s shared by multiple functions, I prefer to have it be more concrete.
But it’s often just a matter of judgement. It is a good fit for library interfaces because you don’t have to find the exact module that exports the type. A good example is the yojson library.
This is exactly the kind of use of polymorphic variants that makes me uncomfortable!
I’m very fond of clearly defined types and even more fond of namespaces, so using polymorphic variants as a way to avoid these things makes me squirm!
I’m really looking for cases where it makes sense to define interfaces in terms of intersections of variants. I believe there are things which are possible to express with polymorphic variants which can’t be expressed with normal variants, but I have trouble figuring out real-world usecases—though some commenters have mentioned a few.
Poly vars are essential IMO for cases where what would otherwise be a monolithic variant type with many constructors can be the composition of many other smaller types, often defined in diverse modules. This lets those modules focus on their particular domains, rather than having to (usually poorly) handle all possible cases in a far larger variant type.
I like this. Using polymorphic variants as a form of constraint is great. I often forget to thing in terms of constraints when programming, but I’m always happy when I do because it gives me much more of a feeling of security that my program is actually doing what I expect it to do.
The main ingredient polymorphic variants bring to the table is sub typing, which is not available for standard sum types. It is possible to write functions that act only on some of the variants defined. The variants don’t belong to a particular type but form an implicitly defined universe of variants. It would be quite difficult to define a type that represents HTML without using polymorphic variants because what tags are legal in what context in HTML is quite flexible.
One case where I was grateful that other libraries used polymorphic variants was in my textmate-language package. I was able to create a union polymorphic variant of various JSON and plist types from different libraries. Then, I wrote a single reader function that handled all of them at once and used subtyping on this function to define readers for the ezjsonm/yojson/plist-xml types. All this was possible without pulling in any of these libraries as actual dependencies.
This approach breaks if a library either uses a nominal type, so I must pull in the library as a dependency, or two libraries use the same polymorphic variant tag with different payloads (e.g. | `Assoc of (string * t) list vs | `Assoc of (string * t) array).
I have some more usecases then the ones mentioned. Firstly I like to use the polymorphic variant types for phantom types in more typesafe library interfaces - here the subtyping is very useful and simple to understand vs GADTs. Phantom types also allow to wrap an efficient implementation with a type signature which is gone at runtime.
I use FRP (react) in a bunch of applications, and in my attained style of writing it, I use polymorphic variants all over the place. I operate on all kinds of local (per function) and combined (per function) data (coming from different FRP sources).
Here it’s really useful to avoid defining new variant types, and just define the type structurally. Having a type-definition (or more…) per function would be bloat.
I use polymorphic variants when I want to distinguish several datatypes t1, t2 etc., but they logically share some of their constructors, and in particular I am interested in turning a t1 into a t2 by handling just the constructors that differ, and having a simple “otherwise return the input” other case.
type common_t = [
| `A of bool
| `B of float
]
type t1 = [
| common_t
| `C1 of int
]
type t2 = [
| common_t
| `C2 of string
]
let transform : t1 -> t2 = function
(* one case for all common constructors *)
| #common_t as v -> v
(* the interesting, non-common cases *)
| `C1 n -> `C2 (string_of_int n)
Advanced language features come with their own usability costs, so I avoid them – including polymorphic variants – whenever it is easy to do so. Maybe the common cases are simple enough, or the number of different versions is low enough, that just using normal variants is enough; then I do it. Either several distinct variant types, or just one variant type that allows all constructors at once. But when the amount of software defect due to allowing everything at once becomes high, polymorphic variants are a nice solution to reason statically on the variants without too much duplication.
There are already many good answers.
Let me give my take as original designer, and also as target for complaints when things go awry.
First, the benefits.
Polymorphic variants allow subtyping.
This allows much flexibility on how to process values, in particular when interfacing with functions from a less typed world, where some values are only allowed in some context. This was the original motivation, and LablGL is maybe the best example for this use. There are also much more involved applications, such as syntax trees.
Their typing is structural.
This was not the original goal, but comes in very handy when you want to combine things a posteriori. For instance, as somebody already answered, combine types from several libraries into one. It has also been used in interfaces to allow using constructors without opening a module. This was a workaround for the absence of type disambiguation of normal variants, which became available only recently.
Now, for the downsides.
Error messages can be hard to read. If your type contains more than a few constructors, you need to define it, polymorphic or not.
Worse than just error messages, in the absence of type annotations the code can become pure spaghetti, with typing just an afterthought. Once again, you need to define types.
It indeed blurs the source of a type, but the same argument was used again type disambiguation. You have to balance comfort and tidiness. For this one, tools could help, to tell you for instance which types and functions contain a specific constructor.
There is one more downside: the runtime cost. A value Pair (x,y) occupies 3 words in memory, while a value `Pair (x,y) occupies 6 words. Indeed, a polymorphic variant uses a full-word tag instead of merging it with the GC metadata, and it carries a single payload, so there is a pointer indirection to the actual block containing the pair.
Recently I’ve been using polymorphic variants as phantom types on GADTs to allow writing functions that selectively operate on subsets of the constructors of the GADT:
I’ve been using this to provide a slightly more usable interface to SQL from OCaml:
The internals of this GADT are not exposed in the interface, but rather the user can construct terms of this type using combinators that I define.
Then, I can define my functions in terms of the subsets of these constructors they can handle - for example, in the syntax of SQL, you can only perform group by on simple select statements - not inserts, updates or deletes:
type ('a,'b,'c) group_by_fun =
'b expr_list -> ('c, 'a) query -> ('c, 'a) query constraint 'a = ([< `SELECT_CORE | `SELECT ] as 'a)
let group_by : ([< `SELECT | `SELECT_CORE ], 'b, 'c) group_by_fun =
fun by (type a b) (table : (b, a) query) : (b, a) query ->
match table with
| SELECT_CORE { .. } -> ..
| SELECT { .. } -> SELECT { .. }
| DELETE _
| UPDATE _
| INSERT _ -> invalid_arg "group by only supported on select clause"
A particular cool aspect of this encoding that goes beyond the capabilities provided by polymorphic variants is that you can write functions (such as group_by above) that preserve the tags of their arguments in their outputs - i.e if you pass in a query with type (_, [> 'SELECT]) query, then the type ensures you will get (_, [> 'SELECT]) query out (and the same for 'SELECT_CORE respectively).
Putting it all together you get quite a nice ergonomic interface to SQL that uses the type system to capture certain syntactic well-formedness properties of the SQL queries that you construct:
Thanks for the kind words! Yeah, that would be a good idea - there’s just a small design choice with regards to versioning and migration that I’m still ironing out, but I should have something standalone soonish.