GC omitting an increasing pile of garbage until OOM ensues

mefyl · April 28, 2025, 12:39pm

We have a server that OOMs every 30 minutes, exhausting 16G of RAM. Thinking we were leaking some global state, we memprof’d it and were surprised by these results:

Subtitles if you’re not used to this output : in black the total memory consumption at time t, in blue the memory consumption at time t that is still uncollected in the highlighted (red) zone.

If you compare the blue curves, they “start” later and later as we progress through the pictures. So it seems that we actually do not leak anything, since everything is eventually collected. However this “backlog” of collectable memory grows steadily until system memory exhaustion. We can sort of see the GC triggering at regular intervals, so it’s as if the amount of freed memory was always inferior to the allocated memory during that interval, so that garbage keep piling up. As if the major slice size was too low ? Could also be that the GC triggers less and less often, but that seems unlikely.

In any case, the smoking gun that proves we’re not leaking is that firing a Stdlib.Gc.full_major () every minute entirely solves the problem, keeping the RAM consumption around 1G even after an hour. (This usually completes in ~100ms so ironically for our use case it is a pretty good garbage collection algorithm ).

Are we correct in concluding that this is a bug? While tweaking the GC settings to make it more aggressive could also solve our issue, I suppose it should never diverge like so ?

Thanks for your time!

Cc @thufschmitt @bnguyenvanyen

smuenzel · April 28, 2025, 12:42pm

What version of ocaml are you using?

kayceesrk · April 28, 2025, 1:11pm

This seems like an instance of the GC pacing bug that other users have encountered. See Poor GC behavior on OCaml 5 · Issue #13868 · ocaml/ocaml · GitHub. There is a work-in-progress fix at https://github.com/ocaml/ocaml/pull/13580.

mefyl · April 28, 2025, 1:24pm

Sorry, should have specified that; initially on observed on 5.2.2, upgraded to 5.3.0 for memprof and problem persists.

mefyl · April 28, 2025, 1:25pm

Thanks! As long as it’s a known issue, we’re find with the workaround until fixed.

toots · April 29, 2025, 2:09pm

Probably related to : Regression with default GC settings between `4.14.2` and `5.1.1` · Issue #13123 · ocaml/ocaml · GitHub

Topic		Replies	Views
No more memory leaks after linux kernel version and settings update (!?) Learning	5	378	February 13, 2025
[Multicore] [Blog post] A deep dive into Multicore OCaml garbage collector Ecosystem multicore	4	2418	July 18, 2017
The best ressources to learn about OCaml GC Learning gc , memory , garbage-collection	3	630	July 10, 2023
Does GC in OCaml only happen when allocating memory? Learning	3	420	June 2, 2024
Garbage collection pause statistics? Ecosystem runtime	6	428	August 21, 2023

GC omitting an increasing pile of garbage until OOM ensues

Related topics