Issues with Incremental Garbage Collection Timing

I’m learning OCaml and have nearly finished reading two articles about the OCaml runtime: “Understanding the Garbage Collector” from *Real World OCaml* and the paper “Retrofitting Parallelism onto OCaml.” Yet I still have some questions:

I understand that a domain requests extra threads to perform incremental garbage collection while blocked on system calls—but aside from that, when else does incremental collection occur? The diagram in the paper shows the mutator alternating with sweeping and marking phases. Is this implemented using safepoints—i.e., by inserting explicit points into the generated code? So far, I only know that `Gc.major_slice` can manually trigger incremental collection, but I haven’t found any other documentation detailing how this mechanism works.

Another question: According to the VCGC (Very Concurrent GC) design, the mutator should be able to run concurrently with both the sweeper and the marker. Why did Multicore OCaml ultimately adopt a design where these phases execute separately—or is my initial understanding simply incorrect?

I should probably continue studying the papers and browsing the OCaml source code to resolve these questions, but my curiosity is too strong—I urgently want answers—so I’ve decided to ask on the forum.

(* translated by machine,i am not a native English speaker *)

OCaml is inserting “safe-points” into the code. Not sure if “safe-point” is the correct terminology, anyways. You can see yourself by taking a look at the assembly of the following OCaml program:

let counter = ref 10_000 in
while !counter > 0 do
  counter := !counter - 1
done

Generate assembly with:

ocamlopt -s gcloop.ml

gcloop.s:

        movl    $20001, %eax
.L101:
        cmpq    $1, %rax
        jle     .L100
        addq    $-2, %rax

        ; safe-point check
        cmpq    (%r14), %r15
        ja      .L101
        jmp     .L103   ; jump to safe-point
.L100:
        movl    $1, %eax
        ret
.L103:
        call    caml_call_gc@PLT   ; HERE

and look at the following lines:

        cmpq    (%r14), %r15
        ja      .L101
        jmp     .L103

.L103 is calling caml_call_gc.

This is the “safe-point”. It’s checked in each loop iteration.

1 Like

https://github.com/ocaml/ocaml/pull/10039 modified safe points for OCaml Multicore, lots of information in that PR description. @mneumann is correct OCaml needs safe points so code can always be interrupted (e.g. run GC, acknowledge signals etc).

1 Like

This aspect did not change much with OCaml 5 compared to OCaml 4. A major slice is performed when the minor heap is half-full, which is triggered by forcing the allocation of a small block to fail (and thus to enter the gc). So the safe points in the code are the allocations.

See the comment in minor_gc.c:caml_empty_minor_heap_promote:

  /* Trigger a GC poll when half of the minor heap is filled. At that point, a
   * major slice is scheduled. */

and domain.c:caml_poll_gc_work:

      /* We have used half of our minor heap arena. Request a major slice on
         this domain. */

Here are two things that changed with OCaml 5:

  • the GC can do “opportunistic” major GC work after the minor GC, while waiting on other domains to finish their minor GC,
  • there are additional safe points that do not correspond to allocations, but those are used when another domain requests a stop-the-world event e.g. for triggering the minor gc, not for progressing in the major GC (except for the opportunistic work described above).
1 Like

Mutator on one domain is allowed to overlap with GC work on other domains. Note that the mutator and the GC may work on the same object. Yet this is safe; it helps that the major GC is non-moving.

1 Like

I have read your provided answer and @kayceesrk’s explanation, and I now understand—thank you for your reply