Peak memory usage evaluation method

Cjen1 · April 5, 2019, 9:54am

Hi I’m trying to evaluate multiple versions of a function on various datasets and record their peak memory behavior on these.

Optimally this would be a case of setting up each of the tests in a function and then running the program, something like:

let tester f = 
    start test;
    ignore(f ());
    end test

I’ve looked into Spacetime, however it seems to be a whole program memory profiler. The same goes for the majority of other profiling options.

Landmarks seems to be kind of what I’m looking however there were some concerns mentioned on reddit about its validity for profiling memory. Regardless it seems to use Gc.allocated_bytes which from the documentation ignores whether the garbage collector has run and hence whether memory has been released.

I’ve also considered splitting each option into an individual program and then testing it manually, however that seems a bit overkill.

TL;DR: How does one get the peak memory usage of a function on a given input?

gasche · April 6, 2019, 4:13pm

The reason why there is no easy way to do this is that computing this quantity would require constantly monitoring memory usage, which would slow down the program. I can think of several approaches:

If you control the implementation of those functions, you can just call Gc.stat regularly and collect the maximal value of live_words, which gives the size of the live memory in the major heap. (You could either add the size of the minor heap, or compute a more precise size there by substracting Gc.get_minor_free). Calling Gc.stat in that way is slow (it traverses the whole heap), so you should not do it too often.
You can also register a callback to be called at the end of each major GC cycle Gc.create_alarm, so that you can ask for live_words there instead of modifying the measurement code. Major collections are rather infrequent, so this would only be somewhat accurate for computations that run for a relatively long time and/or churn through a lot of memory. (Using a printer in the alarm callback let you see how often that happens).
There is a way to use an instrumented version of the GC (see this description for how to do it), and it is possible that parsing the logs would let you compute this information accurately, but I am not sure that this is actually possible currently (for this you would need to have the live part of the major heap logged regularly, and I’m not sure this is done); it would be of course useful to investigate and submit a patch to do it if necessary.

chambart · April 12, 2019, 3:40pm

You can also use the top_heap_words field of from Gc.stat. It’s the max size for the whole run, but if this can still help you pin what is your memory hot-spot.

Cjen1 · April 25, 2019, 11:50am

Thanks! I’ve ended up going for the Gc.create_alarm approach.

Topic		Replies	Views
How to measure the memory occupied by a given data structure? Learning gc	9	2518	May 18, 2018
How to profile time spent in used library with landmarks Learning	4	568	December 7, 2021
Memory profiling in 5.x Learning profiling	6	657	June 14, 2024
Diagnosing large amounts of time spent in GC Learning capnp	15	2961	May 13, 2020
5.x GC compaction pause duration mental model Learning	4	437	May 12, 2025

Peak memory usage evaluation method

Related topics