To my understanding, the OCaml compiler provides precise information about GC roots, but I would like to know how and/or where in the source listings I can find out more precisely how this is done.
My understanding is as follows.
GC roots are found in several places: the OCaml stack, the C stack (local roots of C primitives), the C heap (for manually registered global roots), machine registers (when using the native compiler).
The roots found in C stack or heap are explicitly communicated to the runtime so the compiler does not have to do any work for these.
For roots in the OCaml stack, the runtime needs to walk the stack to find them. When using the bytecode compiler, the OCaml stack consists solely of OCaml values so the runtime system can easily pick the values that correspond to heap-allocated objects as they have a different representation than integers.
In native code, the OCaml stack may contain other data apart from well-formed OCaml values (eg return addresses), and roots may also be stored in machine registers. In this case that the runtime system needs some assistance from the compiler. This assistance takes the form of a static data structure that is generated at compilation time (the frametable) which is a table mapping return address of OCaml functions to sets of stack offsets and register numbers containing live roots.
The information of which stack offsets and registers contain roots becomes available after register allocation, and the frametable is recorded at the same time as the final assembly code. For example, for the amd64 backend:
The frametable data is consumed by the runtime system. Much of the code that does this can be found in the file
Finally, there is a bit of architecture-specific assembly code that is used to spill all registers used by the OCaml allocator to the stack when a GC is triggered. For example, for the amd64 backend:
Cheers,
Nicolas