Compiler optimization on flattening ADT for less boxing?

alxest · January 2, 2024, 11:50am

Maybe this is a question on compilers for functional languages in general, not specific to OCaml.

Consider a nested tuple of type, say: let abc: (A * (B * C)) = (a, (b, c)).
In the usual compilation, its memory representation will contain two levels of indirections: one for the first pair and another for the second pair.
However, if I wrote the code using type triple = type triple = {a: A; b: B; c: C} in the first place, I could have saved one level of indirection. (for simplicity, let us assume that the subterm (b, c) was used only as a subterm of abc, and not used elsewhere)

I wonder if an optimization that translates the former to the latter is discussed/implemented somewhere? (I have checked this, but I am not sure if they are sufficient to cover all cases - e.g., interprocedural cases)
And if there is such optimization, I wonder if they cover recursive types? For example, ideally, it could be possible to optimize a linked list in the usual functional programming into a vector-like representation (with some assumptions on its API/representation independence).

Thank you.

nojb · January 2, 2024, 12:16pm

Sorry, but I don’t understand the question. The type triple is a sum type, I don’t see how you could rewrite code using the nested tuple type with it. Can you add some more detail?

Cheers,
Nicolas

alxest · January 2, 2024, 12:55pm

@nojb Dear Nicolas,

Sorry, it was a mistake. It should be: type triple = {a: A; b: B; c: C}.
(updated the post too)

Best regards,
Youngju Song

nojb · January 2, 2024, 1:25pm

As far as I know, no such “flattening” optimization on user-defined types exists or is planned/discussed.

Cheers,
Nicolas

silene · January 2, 2024, 1:35pm

There are at least two main difficulties. The first one is specific to OCaml, in that its runtime does not allow pointers to point inside data structures. So, a call like snd abc would create an invalid pointer which would break havoc in the garbage collector (among other things). Therefore, the compiler would have to perform a global pass to check that a function such as snd is never called on a value of that type.

And this brings us to the second major issue: polymorphism. If the language allows for polymorphism, then static typing is no longer sufficient to detect such improper uses of snd. Note also that there is a slight variation on the issue of polymorphism. Consider the type (A * B) * C instead. The snd function now needs to return the third field of the flattened value. But without type information at runtime, it would end up returning the second field, which is not the second component anymore. (A workaround would be to disallow runtime polymorphism and thus to monomorphize the whole program at compile time.)

alxest · January 2, 2024, 4:06pm

Dear Nicolas,

Thank you for the input!

Best regards,
Youngju Song

Topic		Replies	Views
Will the native compiler ever optimize memory layout of compound types? Learning optimization	0	944	February 21, 2019
Is generic polymorphism implicit adhoc polymorphism? Learning	5	380	July 10, 2025
Representing data more compactly but unsafely Community performance , runtime , usafe	13	1706	September 16, 2021
Why are multiple fields of polymorphic variants not flattened? Learning	10	1030	July 17, 2023
Single-field records Learning	3	1268	November 3, 2017

Compiler optimization on flattening ADT for less boxing?

Related topics