Polymorphic variant representation

recoules · June 3, 2020, 7:26am

Hi,

I am trying to understand the difference between standard and polymorphic variants in term of memory representation and code optimization.
So far, what I understood is that:

standard variants:
- non parametrized variants are represented as integer, starting from 0 (0x1) with incrementing tag;
- parametrized variants are represented as memory blocs with a tag stored in the header – as a byte, tags start at 0 and range up to 255 (with some reserved values at the end).
polymorphic variants:
- non parametrized variants are represented as integer – it is the hash of its name;
- parametrized variants are represented as memory blocs with tag 0 and their first field is the hash of their name.

I can think about two drawbacks for using polymorphic variants:

parametrized ones use one more word in memory;
unlike the tag that is bound to small numbers, the hash can range from 0 to 2^31+ – some optimization like pattern matching with jump table can not be used, forcing to use a sequence of if-then-else.
Thus, polymorphic variants can end up producing slightly slower running code.

Am I forgetting something or is it all?

Also, I am wondering, as polymorphic variants use hash function, is a collision possible for two distinct variants? Theoretically, it is but I am expecting it to be very (very…) rare events.
Is there, in practice, some ̀hash` properties by design, that there is no collision for any string of size lower than X where X is the longest name writable for a variant?

Regards,

smolkaj · June 3, 2020, 8:02am

This is not really in scope for your question, but another disadvantage of polymorphic variants is that their typing rules are more complicated.

vlaviron · June 3, 2020, 8:02am

There’s an advantage to polymorphic variants that you have not mentioned: you can pass their values from two different types without the cost of a conversion (if the types are compatible).

I don’t know the exact properties of the hash function, but I believe there is a compile-time check that no two variants with the same hashes are used in the same compilation unit (it checks on all variant constructors appearing in types, so even if you have a collision, you will never be able to build a program manipulating both values together). It’s possible that you could observe a collision with the polymorphic comparison on values whose type has been existentially quantified, but that’s not specific to polymorphic variants (you can check that 0 has the same representation as false the same way).

yallop · June 3, 2020, 8:21am

Yes:

$ ocaml
        OCaml version 4.10.0

# [`squigglier; `premiums];;
Line 1, characters 14-23:
1 | [`squigglier; `premiums];;
                  ^^^^^^^^^
Error: Variant tags `squigglier and `premiums have the same hash value.
       Change one of them.

recoules · June 3, 2020, 9:49am

It is true, but they allow more flexibility and I expect them to be easier to learn, from a beginner point of view, than GADT.

You are right. It is exactly why I am looking for polymorphic variant. In this different topic Sum types: sub- & super-type, I present two use cases where polymorphic variants perform well but, are maybe too powerful for what I want to do.

Oh… so it is possible, but is caught by the compilation.
I am curious, is it an already known collision example, or did you find it just for me?

SkySkimmer · June 3, 2020, 1:30pm

It’s mentioned in Practical generic programming in OCaml | Proceedings of the 2007 workshop on Workshop on ML (2007)
See also "squigglier" ocaml - Google Search

Topic		Replies	Views
Arbitrary polymorphic variants as parameters Learning polymorphism	4	1431	March 25, 2019
Is there any kind of guidline about when to use polymorphic variants? Learning	20	3582	January 12, 2023
Why are multiple fields of polymorphic variants not flattened? Learning	10	1109	July 17, 2023
New lesson on polymorphic variants Learning	11	707	March 31, 2025
Second of Two Lessons on Polymorphic Variants: Practical Usecases Learning	0	196	April 2, 2025

Polymorphic variant representation

Related topics