Proposal: Another way to debug memory leaks

memprof helps you discover where memory was allocated, which is certainly useful. However, that may not be enough information to isolate a leak. Sometimes you’d like to know what variables refer to excessive amounts of memory.

For this, you’d want to examine all the garbage collection roots and report how much memory is used by each. This is useful information if you can map a GC root back to a source file and variable.

I prototyped code to do that to help with Coq bug It localized several leaks enough across over 500 source files so that we could find and fix them. But my prototype code is a bit crude. I’d like to clean it up and submit it as a PR. Since this could be done in various ways, I wanted to get some design/API feedback up front rather than maybe doing some of it twice. Also I’d like to be confident that such a PR would be accepted and merged in a reasonable amount of time–otherwise why bother.

caml_do_roots shows how to access the GC roots. There are several types of roots:

  • global roots, corresponding to top-level variables in source files
  • dynamic global roots
  • stack and local roots
  • global C roots
  • finalized values
  • memprof
  • hook

Proposed API (in

val print_global_reachable : out_channel -> int -> unit

Prints a list to out_channel of the global roots that reach more than the specified number of words. Each item shows the number of reachable words, the associated index of the root in the *glob for that file (see the code) and the name of the source file.

Something like this (but with only filenames rather than pathnames):

    102678 field  17 plugins/ltac/
    102730 field  18 plugins/ltac/
    164824 field  20 plugins/ltac/
   1542857 field  26 plugins/ltac/
  35253743 field  65 stm/
  35201913 field   8 vernac/
   8991864 field  24 vernac/
    112035 field   8 vernac/
   6145454 field  84 vernac/
   6435878 field  89 vernac/

I would use ELF information in the binary file to map from *glob back to a filename. For example, the address of the symbol camlTest equals *glob and corresponds to This works for binary executables compiled with the -g option. It wouldn’t work for byte-compiled code. It would print an error message if it’s not ELF or not -g. Also, being a little lazy, how essential and how much more work is it to support 32-bit binaries? (Q: What happens if you have 2 source files with the same name though in different directories? Would the symbol table distinguish them?)

val get_field_index : Obj.t -> int

Returns the *glob index number for the top-level variable (passed as Obj.repr var). I expect there’s no way to recover variable names from the *glob index. In my experiments, it appeared that the entries in *glob were in the same order as as the variable and function declarations. This would let a developer do a binary search in the code to locate the variable, which is probably a necessity for large, complex files such as Coq’s–3300 lines, 10+ modules contained within the file. (I noticed that variables defined in modules contained within the source file were not in *glob. I expect there is a root for the module as a whole and that those variables can be readily found within that root.)

This would need an extended explanation in gc.mli.

val print_stack_reachable : out_channel -> int -> unit

Prints a backtrace to out_channel that also shows which roots for each frame reach more than the specified number of words. (I’d keep the “item” numbers since there’s no way to translate them to variables and they might give some clues.)

Called from file "tactics/" (inlined), line 207, characters 29-40
 356758154 item    0 (stack)
Called from file "plugins/ltac/", line 752, characters 6-51
  17646719 item    0 (stack)
    119041 item    1 (stack)
Called from file "engine/", line 195, characters 38-43
    119130 item    0 (stack)
 373378237 item    1 (stack)

As it turns out, 90% of the memory in the Coq issue mentioned above is reachable only from the stack.

I didn’t consider the other types of roots yet, which I don’t fully understand, such as local roots. Just covering global and stack roots seems like a good contribution. Dynamic global roots may be easy to add if they are otherwise similar to global roots. For the others I could print the reachable words, but I don’t know how to direct the developer to look at the relevant part of the code, especially if it’s in C code. I suppose print_global_reachable and print_stack_reachable could be a single routine as well. Maybe that’s better.

Let me know your thoughts.

cc @ppedrot @SkySkimmer


cc @jhjourdan @stedolan

(brain dump; I’m sure people who know more about memory profiling could say more)

In general those tools make it easy to access information on where values were allocated, and less easy to access information on where they are retained.

OCamlPro’s memprof commercial profiler has a related feature which is the “per-root” view. I think it would let you see where values are retained.

In general computed this information in a fine-grained way is costly, as you would need to remember for each word the reachability path(s) from the roots, or at least the set of roots that can reach it, which would slow down memory traversal quite a bit.

Your approach of only computing a “size” of retained memory for each root seems easier to implement in a not-too-slow way, and useful to identify leaks. (I’m still not sure what you do with values that are reachable from several different roots?) But in general we want to see not only the roots, but reachability paths (as we want to see not just allocation points, but allocation stack traces). Now if we assume that we are already using statmemprof, so a statistical sample of the values are marked “interesting”, can we somewhat-efficiently compute those reachability paths for the sampled values?

This sounds easy to compute at a time where the memory graph has its pointer inverted; this may be the case in the major heap during major collections (but then only a slice is accessible), and during compaction. At this point if you hit a value that is marked as interesting, you can just walk up the inverted pointers to the root and you get a reachability path (but not all paths to reach the value, unfortunately). So maybe this information can be computed efficiently during major GC runs. (Running only in slices mean that the heap changes a bit in the middle, but we can assume that the big picture does not change for paths retaining a lot of memory.)

I don’t know if this can be done in userland today, or if it would need specific runtime support that would require changing the runtime. (Probably the latter.)

1 Like

Thanks @gasche. I inquired about the price for OCamlPro’s tool, which looks interesting, though I’m unlikely to want to spend my own money on that.

I use Obj.reachable_words to count the reachable words, nothing to invent there. Gc doesn’t invert the pointers in the memory graph. It uses 2 bits per allocated block to mark which blocks have been reached. See Real World OCaml under “Marking and Scanning the Heap”. Obj.reachable_words uses the same bits.

I don’t try to account for sharing, so summing up reachable words from multiple roots is not useful. It might be interesting to figure out how much is shared between 2 specified objects; the stack entries in my Coq use case have a huge amount of sharing. But I think you’d need more than 2 bits per block, so that likely requires making a hash table of blocks. That could be computed entirely with OCaml code without modifying the runtime library.

The difficulty in using the “reachability path”, as you put it, is that the compiler doesn’t preserve type information or even AFAIK variable names, so it would be very hard for a developer to use that information directly. They really need to figure out what variable has the excess memory and then work forward from there. And it very quickly becomes completely application dependent.

1 Like

This is excellent. I thought I might make some suggestions, coming from the Java world (where memory leaks have been a bane …)

(1) There, it’s been useful to compute the dominator-tree of the heap-reference-graph: it identifies which object “owns” a particular block of memory, in a way that simple pointer-chasing does not. You’ll find references in lots of places, but I learned this idea from Nick Mitchell (who was at IBM at the time).

[the rest of this is off-the-cuff musings]

(2) Of course, knowing that a block “owns” memory doesn’t help much unless you can map from the block to the code that allocated it. It might be interesting to make a version of the Ocaml runtime and compiler that inserted an extra word of data (a checksum) into allocated blocks, and perhaps produced a mapping from those checksums to source-code locations. You wouldn’t normally compile/run with this, but only do so when you were chasing memory-leaks.

(3) I guess one might imagine implementing #2 as a source-to-source transformation, adding the checksum field to type-definitions.

1 Like

Indeed, OCamlPro did a lot of work on this topic. Memprof was able to dump the full pointer graph, compressed, with type information for every block and names of the static roots and stack roots. We also had some tooling to analyse the graph, reverse it using linear algorithms to fit in memory, so that we could display for every block from where it was reachable (we used it to successfully analyse a 32 GB dump on a computer with 8 GB of RAM). All the tooling was available for OCaml 4.01, but we haven’t upgraded it by lack of community interest and funding. If some people think the project should restart and be open-sourced, we would be happy to discuss how to fund such work, for example by including it as part of a European project or French ANR project.