Great talk, but funny enough not a word about allocation strategy.
I really like the idea of the unboxed types presented and enjoyed watching Stephen’s talk. One thing that puzzels me, is how the GC distinguishes between something of kind
value and something of kind
My understanding is that the GC traverses the whole stack and distinguishes
immediates by its last bit. But how does the GC know something of kind
int64 is not a GC pointer?
Is the idea to add bit masks to stack frames? (I think GHC does this if I’m not mistaken.) Does OCaml use stack frames at all?
If the value is boxed on the heap, then the box itself tells the GC not to scan its content.
If the value is unboxed on the stack, then there is a large table that, for every return address of the program (i.e., every function call), indicates how to interpret every word of the stack. So, when the GC runs, it visits the whole program stack, jumping from one return address to the other, marking the stack values as GC roots (or not), depending on the content of this table.
Great! Thanks @silene, this helps my understanding and triggers another question.
Do I understand correctly that the table you’re referring to, is the addition this proposal makes to keep track of unboxed types and current OCaml only uses pointer tagging? If otherwise: why bother with both mechanisms?
I do not understand your question. The table I am describing has existed forever (or at least for 25 years).
Ah, sorry that I’m unclear. Let me formulate it differently. We have a stack and we’d like to know which words are pointers and which are integers. If there is a table describing which words on the stack are pointers, and which are not, why does OCaml also tag integers to distinguish them? It seems to me as both approaches have the same goal.
All the values in memory form a directed graph. The GC starts from some roots and follow all the edges of the graph until it has accounted all the nodes. Pointers on the stack are roots (if the table says so) and point to values on the heap. Tagging integers is for the sake of heap values; it tells the GC whether a word found in a heap block is a pointer or not.
Note that no word (be it on the stack or on the heap) ever point to a block on the stack, hence this whole discussion: What would be needed to be able to allocate some blocks on the stack?
Because you don’t want the GC to keep doing lookup on the table, checking a single bit is fast, doing a lookup not nearly as much.
The deep reason for a uniform representation is that it allows expressive parametric polymorphism with a simple implementation. The Achilles’ heel of a kind-based memory representation is lack of kind polymorphism, unless you go into compile-time monomorphisation (templates, with its own set of drawbacks) or just-in-time compilation. A fortiori, this allows you to have types inhabited by a mix of pointers and immediates (e.g. the current representation of the
option), though this is not the only way to achieve this.