Why are multiple fields of polymorphic variants not flattened?

StrongerXi · May 23, 2021, 5:36am

A quote from Chapter 20 (on memory representation of values) of “Real World OCaml”:

polymorphic variants must adopt a more flexible uniform memory representation, since they may be reused in a different context across compilation units.

Also a quote from this website:

Unlike constructed values, polymorphic variant values taking several arguments are not flattened. That is, ‘VConstr( v , v’ ) is represented by a block of size 2, whose field number 1 contains the representation of the pair ( v , v’ ), rather than a block of size 3 containing v and v’ in fields 1 and 2.

I don’t quite understand why this must be the case. Could anyone offer an example of what might go wrong if polymorphic variants always “inline” its fields into its block?

silene · May 23, 2021, 7:29am

The syntax of polymorphic variants is ambiguous. When you write `Foo (0,1), the compiler has no way to know whether you are talking about a constructor that takes two integer arguments or a constructor that takes a single pair argument. The compiler would need to perform a global analysis of all the compilation units to detect whether it can be the latter. Since it does not, it has to assume the worst, which leads to the inefficient memory representation.

hyphenrf · May 23, 2021, 10:31am

I don’t get this.
Isn’t `Foo(1,2) and Foo(1,2) (syntactically) on the same level of ambiguity if any?

silene · May 23, 2021, 11:12am

For normal constructors, the mandatory type definition makes it possible to disambiguate. For polymorphic constructors, there is no such thing.

# `Foo (1,2);;
- : [> `Foo of int * int ] = `Foo (1, 2)
# Foo (1,2);;
Error: Unbound constructor Foo

StrongerXi · May 23, 2021, 1:21pm

This makes sense, although I’m surprised that a grammar choice leads to inefficient backend code…
Why didn’t the dev team choose a different syntax for instantiating a polymorphic variant? Was there any discussion on this?

octachron · May 23, 2021, 3:33pm

The issue cuts deeper than just the syntax. Using unboxed tuples for polymorphic variant arguments would also expose them indirectly to the type system. For instance, the type [ `Foo of #('a * 'b) ] is not compatible with [ `Foo of 'c]. In the case of normal constructor, those unboxed types are mostly kept under wraps, and only manifest themself when pattern matching using the wrong arity:

type unboxed = Foo of int * int
let error (Foo x) = x
type boxed = Foo of (int * int)
let fine (Foo x) = x

StrongerXi · May 23, 2021, 8:57pm

Hmm I think I understood your point, but wouldn’t the following grammar eliminate the ambiguity and leaves the type system intact? (vertical bar is picked arbitrarily)

`Foo |e1, e2, ..., en| (* n >= 0 *)

So we have either one of following:

`Foo |(1, 2)|
`Foo |1, 2|

It doesn’t look like regular variants anymore, but it’s semantically different from those anyway, and there seems to be no backward compatibility issue if we introduced it this way (I’m just trying to understand the development process behind this, not proposing a change at all).

octachron · May 23, 2021, 9:35pm

The type system needs to handle the situation when different part of the code base try to use the same polymorphic variant constructor with different arities. Consider for instance,

let both f x =
  let `Foo a = x in
  let `Foo |b,c| = x in
  f a + b + c

The type of both is something like: ('a -> int) -> [< `Foo of #(int * int) & 'a] -> int, where 'a and #(int *int) have different kinds and cannot be unified.

StrongerXi · May 24, 2021, 8:03pm

different part of the code base try to use the same polymorphic variant constructor with different arities

This is what I was missing. Thank you so much!

Levi_Roth · July 14, 2023, 11:03am

I think I’m still confused. If the kinds can’t be unified, why can’t that just be a compile-time error at the point where you try to unify them? Is the issue here one of soundness, or is it “just” that it would take more work on the type system to be able to handle cases like this?

octachron · July 17, 2023, 8:10am

OCaml doesn’t have a layout kind system (I was using the notation from the unboxed types RFC) that would allow to express the issue as a layout mismatch. Thus the language doesn’t allow the mismatch to arise by choosing the default layout kind which is used for all first-class types.

Note that echoes of other layouts do appear currently in OCaml, but only in second-class objects like inline records, elements of an unboxed array, or argument of a n-ary constructor (with n>1).

Topic		Replies	Views
Empty polymorphic variant type? Learning	5	1219	October 10, 2021
Second of Two Lessons on Polymorphic Variants: Practical Usecases Learning	0	150	April 2, 2025
Confused about the application operator's behavior with type variant constructors Learning	8	604	December 16, 2022
Polymorphic variant type annotation Learning	5	1014	July 28, 2019
Polymorphic variants & constraints Learning	3	876	May 5, 2020

Why are multiple fields of polymorphic variants not flattened?

Related topics