GC omitting an increasing pile of garbage until OOM ensues

We have a server that OOMs every 30 minutes, exhausting 16G of RAM. Thinking we were leaking some global state, we memprof’d it and were surprised by these results:





Subtitles if you’re not used to this output : in black the total memory consumption at time t, in blue the memory consumption at time t that is still uncollected in the highlighted (red) zone.

If you compare the blue curves, they “start” later and later as we progress through the pictures. So it seems that we actually do not leak anything, since everything is eventually collected. However this “backlog” of collectable memory grows steadily until system memory exhaustion. We can sort of see the GC triggering at regular intervals, so it’s as if the amount of freed memory was always inferior to the allocated memory during that interval, so that garbage keep piling up. As if the major slice size was too low ? Could also be that the GC triggers less and less often, but that seems unlikely.

In any case, the smoking gun that proves we’re not leaking is that firing a Stdlib.Gc.full_major () every minute entirely solves the problem, keeping the RAM consumption around 1G even after an hour. (This usually completes in ~100ms so ironically for our use case it is a pretty good garbage collection algorithm :clown_face: ).

Are we correct in concluding that this is a bug? While tweaking the GC settings to make it more aggressive could also solve our issue, I suppose it should never diverge like so ?

Thanks for your time!

Cc @thufschmitt @bnguyenvanyen

2 Likes

What version of ocaml are you using?

This seems like an instance of the GC pacing bug that other users have encountered. See Poor GC behavior on OCaml 5 · Issue #13868 · ocaml/ocaml · GitHub. There is a work-in-progress fix at https://github.com/ocaml/ocaml/pull/13580.

3 Likes

Sorry, should have specified that; initially on observed on 5.2.2, upgraded to 5.3.0 for memprof and problem persists.

Thanks! As long as it’s a known issue, we’re find with the workaround until fixed.

1 Like

Probably related to : Regression with default GC settings between `4.14.2` and `5.1.1` · Issue #13123 · ocaml/ocaml · GitHub