How does the compiler provide roots to the GC?

bmourad01 · August 16, 2023, 8:17pm

To my understanding, the OCaml compiler provides precise information about GC roots, but I would like to know how and/or where in the source listings I can find out more precisely how this is done.

nojb · August 16, 2023, 9:37pm

My understanding is as follows.

GC roots are found in several places: the OCaml stack, the C stack (local roots of C primitives), the C heap (for manually registered global roots), machine registers (when using the native compiler).

The roots found in C stack or heap are explicitly communicated to the runtime so the compiler does not have to do any work for these.

For roots in the OCaml stack, the runtime needs to walk the stack to find them. When using the bytecode compiler, the OCaml stack consists solely of OCaml values so the runtime system can easily pick the values that correspond to heap-allocated objects as they have a different representation than integers.

In native code, the OCaml stack may contain other data apart from well-formed OCaml values (eg return addresses), and roots may also be stored in machine registers. In this case that the runtime system needs some assistance from the compiler. This assistance takes the form of a static data structure that is generated at compilation time (the frametable) which is a table mapping return address of OCaml functions to sets of stack offsets and register numbers containing live roots.

The information of which stack offsets and registers contain roots becomes available after register allocation, and the frametable is recorded at the same time as the final assembly code. For example, for the amd64 backend:

github.com

ocaml/ocaml/blob/53317424ef0224d305a61fda0e7ec6adeb38ddc7/asmcomp/amd64/emit.mlp#L237-L255


      
          (* Record live pointers at call points -- see Emitaux *)
          
          let record_frame_label env live dbg =
            let lbl = new_label () in
            let live_offset = ref [] in
            Reg.Set.iter
              (function
                | {typ = Val; loc = Reg r} ->
                    live_offset := ((r lsl 1) + 1) :: !live_offset
                | {typ = Val; loc = Stack s} as reg ->
                    live_offset := slot_offset env s (register_class reg) :: !live_offset
                | {typ = Addr} as r ->
                    Misc.fatal_error ("bad GC root " ^ Reg.name r)
                | _ -> ()
              )
              live;
            record_frame_descr ~label:lbl ~frame_size:(frame_size env)
              ~live_offset:!live_offset dbg;
            lbl

The frametable data is consumed by the runtime system. Much of the code that does this can be found in the file

https://github.com/ocaml/ocaml/blob/53317424ef0224d305a61fda0e7ec6adeb38ddc7/runtime/caml/frame_descriptors.c

Finally, there is a bit of architecture-specific assembly code that is used to spill all registers used by the OCaml allocator to the stack when a GC is triggered. For example, for the amd64 backend:

https://github.com/ocaml/ocaml/blob/53317424ef0224d305a61fda0e7ec6adeb38ddc7/runtime/amd64.S#L460-L497

Cheers,
Nicolas

Topic		Replies	Views
Compile a language to C with OCaml GC support Learning compiler	11	1532	January 18, 2021
Are Begin_rootsN and End_roots deprecated? Learning c , gc	0	744	March 1, 2019
Relaxed rules for binding a C library? Learning multicore	8	721	October 25, 2022
Storing an OCaml value in a C structure Community	1	1221	March 19, 2019
Interfacing C++ with OCaml Learning	14	2046	April 3, 2022

How does the compiler provide roots to the GC?

Related topics