If the bit is set, OCaml considers the value an integer. If not, it thinks the value is a pointer to a larger object. This is a neat trick used in lots of places. The V8 engine in Chrome famously did this for numbers as well.
However, it does seem silly to do when the type system knows that the type is a simple integer.
It seems that the information that the value is an integer, is lost after type-checking, and is not used to simplify the generated Assembly?
The GC uses that bit to know whether a value is an immediate value or a pointer to a heap allocated block. The GC just needs some discriminant to be there to know how to treat a value, which is why itâs there at runtime despite the type checking. More information is here: Memory Representation of Values - Real World OCaml. Other approaches to this are to maintain this info in a separate data structure (Iâve seen it called a âstack mapâ), or to use conservative GC (where values that âlook likeâ pointers are assumed to be pointers).
In dynamically typed language runtimes (like V8 mentioned in your quote), the trick used is usually that every value is treated as a double, and the NaN bits are used as the metadata (sometimes called NaN-tagging or NaN-boxing). Thereâs a pretty good description here.
In at least some JVMs, everything â everything â comes with type-maps. Of course, every object in the heap has a class, and those have type-maps that tell you which slots are primitive values and which are pointers. Every stack-frame either has a type-map, or one can be generated (e.g. the Solaris JVM used to have a cache of 'em â when that cache filled-up, youâd get inexplicable slowdowns[1] as the VM had to recompute type-maps, only to have them flushed b/c the cache was too full). Then the equivalent of the OCaml runtime requirement that âyou can only suspend execution at âsafe pointsâ where all values are âin representationââ, becomes âyou can only suspend execution at âsafe pointsâ where it is known how to compute the type-map of the current stack-frameâ.
Itâs been a long time since I worked in JVMs, but this was the state of play in the mid-00s.
ETA: oof, I forgot the terminology! When itâs a stack-frame, itâs called a âstack-mapâ, just as @nated says.
[1] gosh, how could I know this? Heh, had to debug that problem onsite at a customer; thank goodness my partner-in-crime had the source to the Solaris JVM, and we could build a custom version with a double-sized stack-map cache.
The article overlooks the fact the fact that most of the time the typechecker does not know that a value is an integer, due to polymorphism and abstract types.
Locally, the compiler will perform some untagging optimizations, as you can observe if you change the function in the article to
As you can see, the tag is only added/removed once, not three times. However, the program must revert to the general representation at function boundaries in order to be able to compose with the rest of the program.
Finally, while maintaining the tag bit on integers carries some cost, this cost is rather small (a few percentage points for integer-heavy computations, see section 5 of https://xavierleroy.org/publi/unboxing-tic97.pdf).