Why are multiple fields of polymorphic variants not flattened?

A quote from Chapter 20 (on memory representation of values) of “Real World OCaml”:

polymorphic variants must adopt a more flexible uniform memory representation, since they may be reused in a different context across compilation units.

Also a quote from this website:

Unlike constructed values, polymorphic variant values taking several arguments are not flattened. That is, ‘VConstr( v , v’ ) is represented by a block of size 2, whose field number 1 contains the representation of the pair ( v , v’ ), rather than a block of size 3 containing v and v’ in fields 1 and 2.

I don’t quite understand why this must be the case. Could anyone offer an example of what might go wrong if polymorphic variants always “inline” its fields into its block?

The syntax of polymorphic variants is ambiguous. When you write `Foo (0,1), the compiler has no way to know whether you are talking about a constructor that takes two integer arguments or a constructor that takes a single pair argument. The compiler would need to perform a global analysis of all the compilation units to detect whether it can be the latter. Since it does not, it has to assume the worst, which leads to the inefficient memory representation.

1 Like

I don’t get this.
Isn’t `Foo(1,2) and Foo(1,2) (syntactically) on the same level of ambiguity if any?

For normal constructors, the mandatory type definition makes it possible to disambiguate. For polymorphic constructors, there is no such thing.

# `Foo (1,2);;
- : [> `Foo of int * int ] = `Foo (1, 2)
# Foo (1,2);;
Error: Unbound constructor Foo
2 Likes

This makes sense, although I’m surprised that a grammar choice leads to inefficient backend code…
Why didn’t the dev team choose a different syntax for instantiating a polymorphic variant? Was there any discussion on this?

The issue cuts deeper than just the syntax. Using unboxed tuples for polymorphic variant arguments would also expose them indirectly to the type system. For instance, the type [ `Foo of #('a * 'b) ] is not compatible with [ `Foo of 'c]. In the case of normal constructor, those unboxed types are mostly kept under wraps, and only manifest themself when pattern matching using the wrong arity:

type unboxed = Foo of int * int
let error (Foo x) = x
type boxed = Foo of (int * int)
let fine (Foo x) = x
1 Like

Hmm I think I understood your point, but wouldn’t the following grammar eliminate the ambiguity and leaves the type system intact? (vertical bar is picked arbitrarily)

`Foo |e1, e2, ..., en| (* n >= 0 *)

So we have either one of following:

`Foo |(1, 2)|
`Foo |1, 2|

It doesn’t look like regular variants anymore, but it’s semantically different from those anyway, and there seems to be no backward compatibility issue if we introduced it this way (I’m just trying to understand the development process behind this, not proposing a change at all).

The type system needs to handle the situation when different part of the code base try to use the same polymorphic variant constructor with different arities. Consider for instance,

let both f x =
  let `Foo a = x in
  let `Foo |b,c| = x in
  f a + b + c

The type of both is something like: ('a -> int) -> [< `Foo of #(int * int) & 'a] -> int, where 'a and #(int *int) have different kinds and cannot be unified.

1 Like

different part of the code base try to use the same polymorphic variant constructor with different arities

This is what I was missing. Thank you so much!

I think I’m still confused. If the kinds can’t be unified, why can’t that just be a compile-time error at the point where you try to unify them? Is the issue here one of soundness, or is it “just” that it would take more work on the type system to be able to handle cases like this?

OCaml doesn’t have a layout kind system (I was using the notation from the unboxed types RFC) that would allow to express the issue as a layout mismatch. Thus the language doesn’t allow the mismatch to arise by choosing the default layout kind which is used for all first-class types.

Note that echoes of other layouts do appear currently in OCaml, but only in second-class objects like inline records, elements of an unboxed array, or argument of a n-ary constructor (with n>1).