⚠️ OCaml 5.x memory usage

Hi all,

I wanted to ring a friendly alarm about some corner case of OCaml’s memory usage since the 5.x switch which is now about 3 years old and about to be on its 5th release since.

This is a graph of the memory usage for a real-use case of our application. Left is with OCaml 4.14.2 and right with ocaml 5.3:

I just completed some early testing with the latest alpha release of OCaml 5.4 and, sadly, the problem is still there.

The application in question, liquidsoap is perhaps an outlier in the OCaml community in that it operates on a very short (~0.02s) loop that creates a lot of data (raw audio/video) on each iteration and for which most of the data can be discarded after each loop but some of it has to be kept for audio/video buffers and such.

It could be argued that this kind of application was never meant to be written in OCaml but, alas, it’s been using the compiler for quite a while without any problems. It’s also a scripting language and OCaml’s tools on that end are awesome for it.

Clearly, though, this kind of use is overwhelming the new OCaml 5.x GC as it does not seem to be able to perform as well as the 4.14.x one in a case like that.

Similar problems have been reported through the compiler’s bugtracker:

Please don’t consider this post as a criticism of the OCaml developpers, the pace and scope of change has been amazing and the attention to making the compiler better is real.

However, at this point, this is becoming an existential issue for us. If we keep being stuck with OCaml 4.14 for our production use, sooner rather than later, we will be unable to use other projects and libraries from the OCaml ecosystem.

My questions are:

  1. Are we the only application in the ecosystem seeing these kind of regression?
  2. Are there any indication that the GC can be brought up-to par on this specific case?
  3. If not, are there any interest in opening up the GC’s API to help application mitigate their use-case?
  4. If not, should we consider other tooling/variants e.g. OxCaml’s stack allocations

For #1, I think that it would be really important to figure out. If we are the only case then this is, indeed, that the application is an outlier. However, I have a hard time thinking that we are the only one in this case and, if so, then this becomes a potential problem for the adoption of the compiler.

For #2 I imagine that it’s hard to say if there isn’t a clear reproducible test. However, I already submitted one for ocaml/ocaml#13123 and it is still pending

For #3, my naive idea would be that, since the application is generating short term data on a regular basis, perhaps more control over minor stack allocations and their cleanup could help?

For instance being able to allocate new data on a specific minor stack and cleanup that minor stack only after running one loop?

It’s hard to say if that would make sense but I feel like this is similar to what OxCaml is doing with their stack allocations

For #4, implementing such a drastic change impacts all our APIs.

Even things like List.fold_left signature have to be adapted to the multiple local use-case.

And, overall, the feature seems to be designed for small functions.

In a case like us, with multiple layers of abstractions and some cases where local memory needs to be promoted to the heap for long-term storage (for instance in a temporary audio buffer), the tools there do not seem immediately obvious to use for it and/or with a real risk of being stuck with an exploratory compiler if these never make it into the mainstream compiler.

Thanks for y’all insight!

4 Likes

No! There have been several reports over at GitHub - ocaml/ocaml: The core OCaml system: compilers, runtime system, base libraries about excessive memory usage. On our side at LexiFi we have been also observing higher memory usage and out-of-memory crashes more often since switching to OCaml 5 (without being able to point to a precise culprit yet).

At the same time, there are at least two major issues that have been identified in the OCaml 5 GC that can result in higher memory usage relative to OCaml 4. These issues are being worked on by Nick Roberts Barnes, Stephen Dolan and Damien Doligez, but it is tricky, painstaking work, so it takes time (but we can rest assured that the people with the right skills are looking at the problem).

For the time being, the usual suggestion seems to be to play with the space_overhead GC parameter (mostly lowering it).

To be clear, the above is my understanding of the situation, there may be more about this issue out there that I may not be aware of :slight_smile:

Cheers,
Nicolas

3 Likes

Presumably unrelated with your issue, but since this thread is about the memory usage of programs compiled with OCaml 5, I might just as well mention the case of Coq/Rocq.

Since it is routinely allocating a huge amount of very short-lived blocks, we have for a long time been bumping a bit the size of the minor heap at startup. With OCaml 4, this increase is unnoticeable from a memory point of view, but it has a noticeable impact on performance.

But with OCaml 5, the situation is quite different. This increase of the minor heap means that Coq/Rocq now instantly allocates 30GB of memory, 99% of which will absolutely never be used (since only one domain is used).

This is easily fixable on our side. But this change of behavior really came as a surprise to me.

1 Like

The increase in major heap size with minor heap size increase is surprising. Is there an issue filed already for this?

These issues are being worked on by Nick Roberts, Stephen Dolan and Damien Doligez

I think you meant Nick Barnes and not Nick Roberts.

There have been several reports over at GitHub - ocaml/ocaml: The core OCaml system: compilers, runtime system, base libraries

See issues tagged with performance:
GitHub · Where software is built. A number of these are to do with memory regression.

The core team has been working to upstream the first of the major fixes. The first one is the Mark Delay PR ("Mark-delay" performance improvement to major GC by NickBarnes · Pull Request #13580 · ocaml/ocaml · GitHub), which is close to the finish line. A number of smaller PRs for GC pacing improvements should follow next. I’m hoping that we will get these regressions fixed in 5.5.

4 Likes

Oops, indeed! Thanks for the correction.

Cheers,
Nicolas

I am not sure whether the question is aimed at me. But if it is, note that I have never mentioned anything about the major heap; the issue we faced is purely caused by the size of the minor heap. With OCaml 4, having 256MB of minor heap is unnoticeable. With OCaml 5, having 32GB of minor heaps is quite noticeable (256MB * 128 domains by default). Sure, most operating systems will not actually waste physical RAM for those unused 31GB of minor heaps. But it still put some stress on the user environment. (We noticed the issue because Coq was instantly crashing when called by Why3.)

$ ulimit -S -v 30000000

$ ocaml
OCaml version 4.14.1
Enter #help;; for help.

# Gc.set { (Gc.get ()) with Gc.minor_heap_size = 32*1024*1024 };;
- : unit = ()
# 

$ ocaml
OCaml version 5.3.0
Enter #help;; for help.

# Gc.set { (Gc.get ()) with Gc.minor_heap_size = 32*1024*1024 };;
Fatal error: Not enough heap memory to reserve minor heaps
Aborted (core dumped)

Is this related to OCaml 5 performance regression for unmarshal-heavy workloads · Issue #13300 · ocaml/ocaml · GitHub ? If not please open an issue on GitHub · Where software is built so we can look into it. I’m currently spending most of my week looking into performance issues and trying to reproduce them.

For Regression with default GC settings between `4.14.2` and `5.1.1` · Issue #13123 · ocaml/ocaml · GitHub I haven’t tried again with OCaml 5.3 and the things we’ve learnt from Prohibitive amounts of runtime lock waits in multicore analysis with Infer · Issue #14047 · ocaml/ocaml · GitHub. It’s likely that decreasing the space_overhead and increasing the minor_heap_size will improve things somewhat. Then the series of GC pacing improvement that @kc mentioned will also help.

When I looked at 30GB, I mistakenly assumed it had to do with the major heap. My bad.

I cannot reproduce the failure on my machines. Can you try

OCAMLRUNPARAM="d=1" ocaml

which sets the maximum number of concurrent domains to 1. See OCaml - The runtime system (ocamlrun). This should attempt to reserve (and allocate) only 256MB for the minor heap and not attempt to reserve 128 * 256 MB.

This doesn’t sound too exotic to me. A “web application server” could easily exhibit a similar behavior (a good amount of “persistent” data, caches and similar, plus a lot of relatively short-lived allocations to serve each request).

Thank you for the pointers. I’m glad to know that the team is working on it. All the best with it, I’ll wait and will try to help it I can.

1 Like

Yes, this works. (This is one of the the very first things we tried when users first reported the fatal error “Not enough heap memory to reserve minor heaps”.) But telling the users to set such an environment variable in their .bashrc is not a long-term solution, for obvious reasons.

I don’t know what a good long-term fix is for this. May I suggest creating an issue on the OCaml Github issue tracker so that we can discuss there?

Btw, the ulimit at the top is intended to trigger the failure? It restricts virtual memory to 30GB? I’m curious when the users will see this failure in practice. 64-bit address space provides lots of virtual address space. Even with 48 to 57 bits used in modern OSes, we still have lots of virtual address space.

Yes, the ulimit call does restrict the virtual memory of a process to 30GB. I am using it because it is the simplest way of reliably triggering the minor heap failure.

And no, a 64-bit address space does not provide a lot of virtual space. Indeed, one still needs to be able to store into physical memory the page table that ranges over this virtual space. Even if the page table is actually stored as a tree, it still uses a non-negligible amount of space. Moreover, the operating system also needs to keep in physical memory a lot of metadata to remember who allocated what and where. That is why things like ulimit -v are used to avoid denials of service.

There is also the issue of the overcommit policy of the operating system. Indeed, a standard 32GB allocation should fail on most Linux computers (“wild allocations [are caught] by comparing their request size to total amount of ram and swap in the system”, cf __vm_enough_memory). But in the case of OCaml on Linux, because minor heaps are allocated with PROT_NONE, the kernel skips the overcommit checks. I don’t know if the OCaml runtime does it on purpose, but I must say that this is quite a clever way of ignoring the overcommit policy.

Anyway, I do not consider the behavior of the OCaml 5 runtime to be a bug at all. I just wanted to mention it in case some users are wondering why processes that use a non-standard minor heap (e.g., Rocq) become so bloated once compiled with OCaml 5. As for the crash when calling Coq from Why3, we solved it by making Coq a privileged process for which resources are unlimited.

It should be possible to do better here in terms of reservations. I’ve made an issue on OCaml Github to bring this to the attention of the devel team: Improving minor heap virtual memory reservation · Issue #14153 · ocaml/ocaml · GitHub. Thanks for the response.

@silene: to double down on what @kayceesrk mentioned, I think that your issue is relatively easy to fix, it’s just a matter of reporting it. (I think that if you had reported it one year ago, we would have fixed it one year ago.) Do not hesitate to report similar quality-of-life issues in the future.

1 Like

This talk goes into the details of the regressions and the fixes that will land upstream: https://www.youtube.com/watch?v=XGGSPpk1IB0. The mark delay PR is the first of these. And then there are a number of pacing fixes that should come in after that.

7 Likes

Amazing talk. Excited to test it. I’m guessing the OxCaml compiler has these improvements, I’ll test with it next.

Yes, OxCaml compiler https://oxcaml.org is what you want.

Liquidsoap doesn’t build now. See OxCaml health check (https://oxcaml.check.ci.dev/) [1], but it looks like the failure may be fixed with some local refactoring (possibly).

What would be useful to know for OCaml core devs is whether using OxCaml fixes the memory regression that you observe in liquidsoap on OCaml 5x. If it does, then we’d be reasonably confident that the forthcoming GC pacing fixes to upstream should fix the memory regressions. So, if it isn’t too much work, please do give OxCaml a try and report back with observations.

[1] Opam Health Check with OxCaml | Tunbury.ORG