Should 32-bit OCaml's header format allow arbitrary sizes?



I’d like to float this idea and see what the community thinks. As you may or may not know, 32-bit OCaml has some data structure size limitations due to its header design. Only 22 bits are allocated for the size field, allowing for sizes of only 16MB (4MB*4) on arrays and strings. 64-bit OCaml doesn’t have this limitation.

My question is, is it time to move beyond this limitation? Most systems actively in existence today will use 64-bit OCaml and not worry about it, yet because of this limitation, programmers need to consider the limited potential size of strings, arrays and hashtable buckets, and use BigArray or BigString when possible. It makes simple programs more complicated than they should be. 32-bit systems, on the other hand, probably suffer from vulnerabilities because programmers did not necessarily consider the limitation properly.

Fixing this would entail adding a 32-bit ‘size’ field to every object in memory for 32-bit systems. It’s a steep cost, but given the simplification it provides, and the fact that the world has moved to 64-bit systems anyway (which would not be impacted), it seems to me to be worth it. It would also mean that after a few versions, we’d no longer have to worry about such serious size limitations.

There are other benefits as well: once we do this, we can increase the number of possible variants in a type, and we’ll have a whole bunch of available bits free to use for whatever we deem necessary. Right now, the 32-bit format is very constraining in terms of available bit usage, meaning that any plan to use those header bits is constrained by the limitations of 32-bits.


Seems a positive change overall.

Is there a way to introduce this as a branch or separate build to limit
the impact?