Is there any kind of guidline about when to use polymorphic variants?

A typical usecase is when you need many slight variations of a sum type and you want to avoid duplicating the whole sum type for each one (this is one of the original motivations, found in LablTk GitHub - garrigue/labltk: LablTk, an OCaml interface for Tcl/Tk).

For this and other use cases, you can skim over Section 3 of https://caml.inria.fr/pub/papers/garrigue-polymorphic_variants-ml98.pdf

Cheers,
Nicolas

3 Likes

My major use case is with the result type. All my functions that return a result look something like: val foo : something -> (r, [> err ]) result

This combines well with monads over result. I know some people in the community disagree but I find this style really powerful.

The other primary usecase is when I know I’ll be narrowing the variants as a function call progresses.

Otherwise, I tend to use normal variants.

4 Likes

My own experience is to start with polymorphic variants first, especially if the data type is simple. For example, many functions need to return some result that may require just a little more than an option type - perhaps 3 or 4 possible states. It’s much easier not to have to define this type.

Later on, if the type expands and becomes complex, or if it’s shared by multiple functions, I prefer to have it be more concrete.

But it’s often just a matter of judgement. It is a good fit for library interfaces because you don’t have to find the exact module that exports the type. A good example is the yojson library.

This is exactly the kind of use of polymorphic variants that makes me uncomfortable! :rofl:

I’m very fond of clearly defined types and even more fond of namespaces, so using polymorphic variants as a way to avoid these things makes me squirm!

I’m really looking for cases where it makes sense to define interfaces in terms of intersections of variants. I believe there are things which are possible to express with polymorphic variants which can’t be expressed with normal variants, but I have trouble figuring out real-world usecases—though some commenters have mentioned a few.

Poly vars are essential IMO for cases where what would otherwise be a monolithic variant type with many constructors can be the composition of many other smaller types, often defined in diverse modules. This lets those modules focus on their particular domains, rather than having to (usually poorly) handle all possible cases in a far larger variant type.

2 Likes

(it was years ago but)

Encoding this state-machine:

https://wr.mondet.org/smondet/ocaml2015/img/target_graph.jpg (from these slides)

using polymorphic variants to enforce that “a given state can come only from certain previous states” →

https://github.com/hammerlab/ketrew/blob/master/src/pure/target.ml#L169

2 Likes

I like this. Using polymorphic variants as a form of constraint is great. I often forget to thing in terms of constraints when programming, but I’m always happy when I do because it gives me much more of a feeling of security that my program is actually doing what I expect it to do.

The main ingredient polymorphic variants bring to the table is sub typing, which is not available for standard sum types. It is possible to write functions that act only on some of the variants defined. The variants don’t belong to a particular type but form an implicitly defined universe of variants. It would be quite difficult to define a type that represents HTML without using polymorphic variants because what tags are legal in what context in HTML is quite flexible.

6 Likes

One case where I was grateful that other libraries used polymorphic variants was in my textmate-language package. I was able to create a union polymorphic variant of various JSON and plist types from different libraries. Then, I wrote a single reader function that handled all of them at once and used subtyping on this function to define readers for the ezjsonm/yojson/plist-xml types. All this was possible without pulling in any of these libraries as actual dependencies.

This approach breaks if a library either uses a nominal type, so I must pull in the library as a dependency, or two libraries use the same polymorphic variant tag with different payloads (e.g. | `Assoc of (string * t) list vs | `Assoc of (string * t) array).

4 Likes

I have some more usecases then the ones mentioned. Firstly I like to use the polymorphic variant types for phantom types in more typesafe library interfaces - here the subtyping is very useful and simple to understand vs GADTs. Phantom types also allow to wrap an efficient implementation with a type signature which is gone at runtime.

I use FRP (react) in a bunch of applications, and in my attained style of writing it, I use polymorphic variants all over the place. I operate on all kinds of local (per function) and combined (per function) data (coming from different FRP sources).
Here it’s really useful to avoid defining new variant types, and just define the type structurally. Having a type-definition (or more…) per function would be bloat.

2 Likes

I use polymorphic variants when I want to distinguish several datatypes t1, t2 etc., but they logically share some of their constructors, and in particular I am interested in turning a t1 into a t2 by handling just the constructors that differ, and having a simple “otherwise return the input” other case.

type common_t = [
  | `A of bool
  | `B of float
]
type t1 = [
  | common_t
  | `C1 of int
]
type t2 = [
  | common_t
  | `C2 of string
]
let transform : t1 -> t2 = function
  (* one case for all common constructors *)
  | #common_t as v -> v
  (* the interesting, non-common cases *)
  | `C1 n -> `C2 (string_of_int n)

Advanced language features come with their own usability costs, so I avoid them – including polymorphic variants – whenever it is easy to do so. Maybe the common cases are simple enough, or the number of different versions is low enough, that just using normal variants is enough; then I do it. Either several distinct variant types, or just one variant type that allows all constructors at once. But when the amount of software defect due to allowing everything at once becomes high, polymorphic variants are a nice solution to reason statically on the variants without too much duplication.

8 Likes

There are already many good answers.
Let me give my take as original designer, and also as target for complaints when things go awry.

First, the benefits.

  • Polymorphic variants allow subtyping.
    This allows much flexibility on how to process values, in particular when interfacing with functions from a less typed world, where some values are only allowed in some context. This was the original motivation, and LablGL is maybe the best example for this use. There are also much more involved applications, such as syntax trees.
  • Their typing is structural.
    This was not the original goal, but comes in very handy when you want to combine things a posteriori. For instance, as somebody already answered, combine types from several libraries into one. It has also been used in interfaces to allow using constructors without opening a module. This was a workaround for the absence of type disambiguation of normal variants, which became available only recently.

Now, for the downsides.

  • Error messages can be hard to read. If your type contains more than a few constructors, you need to define it, polymorphic or not.
  • Worse than just error messages, in the absence of type annotations the code can become pure spaghetti, with typing just an afterthought. Once again, you need to define types.
  • It indeed blurs the source of a type, but the same argument was used again type disambiguation. You have to balance comfort and tidiness. For this one, tools could help, to tell you for instance which types and functions contain a specific constructor.
8 Likes

There is one more downside: the runtime cost. A value Pair (x,y) occupies 3 words in memory, while a value `Pair (x,y) occupies 6 words. Indeed, a polymorphic variant uses a full-word tag instead of merging it with the GC metadata, and it carries a single payload, so there is a pointer indirection to the actual block containing the pair.

7 Likes

I’d be interested to see an example of this, do you have a pointer?

Note that the benefit is only with the following declaration, in which the “tuple” is not detachable:

type t = Pair of int * int  (* two arguments; allocates only one block of memory *)

The following is closer to the polymorphic variant because it also uses two blocks of memory. The subtlety in the syntax is unfortunate:

type t = Pair of (int * int)  (* one argument which is a tuple; allocates two blocks of memory *)
1 Like

Not opensourced currently, but here is a snippet that exemplifies something I do a lot:

let recorded_groups_s =
  let record acc (event, (tick, active_id)) = match event with
    | `Toggle_recording toggle ->
      if toggle then
        Some []
      else 
        let acc = acc |> CCOption.to_list |> CCList.flatten in
        Some ((`Tick tick, None) :: acc)
    | `Active_group_id active_id ->
      acc |> CCOption.map (fun recording ->
        (`Tick tick, Some active_id) :: recording
      )
  in
  let sampling = E.select [
    toggle_recording_e  |> E.map (fun toggle -> `Toggle_recording toggle);
    M_group.active_id_e |> E.map (fun id -> `Active_group_id id) 
  ]
  and sampled = S.l2 ~eq:C.Eq.never C.Tuple.mk2
      I.Frame.tick_s
      M_group.active_id_s 
  and init = None
  in
  S.sample C.Tuple.mk2 sampling sampled
  |> E.fold record init
  |> E.map (CCOption.map T.Gseq.of_recording)
  |> S.hold init

So the mixed sampling events I depend upon is merged with E.select or E.merge - and wrapped in new polymorphic variants. I like this because:

  • it makes the record function explicit about the events that are handled
  • the mapped events can be simplified before being passed on, for conciseness (not done here)
  • any set of events can be merged easily
1 Like

Recently I’ve been using polymorphic variants as phantom types on GADTs to allow writing functions that selectively operate on subsets of the constructors of the GADT:

I’ve been using this to provide a slightly more usable interface to SQL from OCaml:


...
and (_, !'res) query =
  | SELECT_CORE : { .. } -> ('a, [> `SELECT_CORE] as 'res) query
  | SELECT : { .. } -> ('a, [> `SELECT] as 'res) query
  | DELETE : { .. } -> (unit, [> `DELETE] as 'res) query
  | UPDATE : { .. } -> (unit, [> `UPDATE] as 'res) query
  | INSERT : { .. } -> (unit, [> `INSERT] as 'res) query

The internals of this GADT are not exposed in the interface, but rather the user can construct terms of this type using combinators that I define.

Then, I can define my functions in terms of the subsets of these constructors they can handle - for example, in the syntax of SQL, you can only perform group by on simple select statements - not inserts, updates or deletes:

type ('a,'b,'c) group_by_fun =
  'b expr_list -> ('c, 'a) query -> ('c, 'a) query constraint 'a = ([< `SELECT_CORE | `SELECT ] as 'a)

let group_by : ([< `SELECT | `SELECT_CORE ], 'b, 'c) group_by_fun =
  fun by (type a b) (table : (b, a) query) : (b, a) query ->
  match table with
  | SELECT_CORE { .. } -> ..
  | SELECT { .. } ->     SELECT { .. }
  | DELETE _ 
  | UPDATE _ 
  | INSERT _ -> invalid_arg "group by only supported on select clause"

A particular cool aspect of this encoding that goes beyond the capabilities provided by polymorphic variants is that you can write functions (such as group_by above) that preserve the tags of their arguments in their outputs - i.e if you pass in a query with type (_, [> 'SELECT]) query, then the type ensures you will get (_, [> 'SELECT]) query out (and the same for 'SELECT_CORE respectively).

Putting it all together you get quite a nice ergonomic interface to SQL that uses the type system to capture certain syntactic well-formedness properties of the SQL queries that you construct:

Sql.select [instance_id; args_list] ~from:Tables.clause_data
|> Sql.group_by [instance_id_ref]
|> Sql.order_by ~direction:`ASC Tables.data_arg_index
6 Likes

This looks like a super interesting module that lifts SQL into OCaml. I would love to see that as an Opam package.

3 Likes

Thanks for the kind words! Yeah, that would be a good idea - there’s just a small design choice with regards to versioning and migration that I’m still ironing out, but I should have something standalone soonish.

1 Like

Managed to get round to publishing the SQL library as a standalone artefact: [ANN] Petrol 1.0.0 - A high-level typed SQL API for OCaml designed to go fast! - #2 by Gopiandcode

4 Likes