memprof helps you discover where memory was allocated, which is certainly useful. However, that may not be enough information to isolate a leak. Sometimes you’d like to know what variables refer to excessive amounts of memory.
For this, you’d want to examine all the garbage collection roots and report how much memory is used by each. This is useful information if you can map a GC root back to a source file and variable.
I prototyped code to do that to help with Coq bug https://github.com/coq/coq/issues/12487. It localized several leaks enough across over 500 source files so that we could find and fix them. But my prototype code is a bit crude. I’d like to clean it up and submit it as a PR. Since this could be done in various ways, I wanted to get some design/API feedback up front rather than maybe doing some of it twice. Also I’d like to be confident that such a PR would be accepted and merged in a reasonable amount of time–otherwise why bother.
caml_do_roots shows how to access the GC roots. There are several types of roots:
- global roots, corresponding to top-level variables in source files
- dynamic global roots
- stack and local roots
- global C roots
- finalized values
Proposed API (in gc.ml):
val print_global_reachable : out_channel -> int -> unit
Prints a list to
out_channel of the global roots that reach more than the specified number of words. Each item shows the number of reachable words, the associated index of the root in the
*glob for that file (see the code) and the name of the source file.
Something like this (but with only filenames rather than pathnames):
102678 field 17 plugins/ltac/pltac.ml 102730 field 18 plugins/ltac/pltac.ml 164824 field 20 plugins/ltac/tacenv.ml 1542857 field 26 plugins/ltac/tacenv.ml 35253743 field 65 stm/stm.ml 35201913 field 8 vernac/vernacstate.ml 8991864 field 24 vernac/library.ml 112035 field 8 vernac/egramml.ml 6145454 field 84 vernac/declaremods.ml 6435878 field 89 vernac/declaremods.ml
I would use ELF information in the binary file to map from
*glob back to a filename. For example, the address of the symbol
*glob and corresponds to
test.ml. This works for binary executables compiled with the
-g option. It wouldn’t work for byte-compiled code. It would print an error message if it’s not ELF or not
-g. Also, being a little lazy, how essential and how much more work is it to support 32-bit binaries? (Q: What happens if you have 2 source files with the same name though in different directories? Would the symbol table distinguish them?)
val get_field_index : Obj.t -> int
*glob index number for the top-level variable (passed as
Obj.repr var). I expect there’s no way to recover variable names from the
*glob index. In my experiments, it appeared that the entries in
*glob were in the same order as as the variable and function declarations. This would let a developer do a binary search in the code to locate the variable, which is probably a necessity for large, complex files such as Coq’s
stm.ml–3300 lines, 10+ modules contained within the file. (I noticed that variables defined in modules contained within the source file were not in
*glob. I expect there is a root for the module as a whole and that those variables can be readily found within that root.)
This would need an extended explanation in
val print_stack_reachable : out_channel -> int -> unit
Prints a backtrace to
out_channel that also shows which roots for each frame reach more than the specified number of words. (I’d keep the “item” numbers since there’s no way to translate them to variables and they might give some clues.)
Called from file "tactics/redexpr.ml" (inlined), line 207, characters 29-40 356758154 item 0 (stack) Called from file "plugins/ltac/tacinterp.ml", line 752, characters 6-51 17646719 item 0 (stack) 119041 item 1 (stack) Called from file "engine/logic_monad.ml", line 195, characters 38-43 119130 item 0 (stack) 373378237 item 1 (stack)
As it turns out, 90% of the memory in the Coq issue mentioned above is reachable only from the stack.
I didn’t consider the other types of roots yet, which I don’t fully understand, such as local roots. Just covering global and stack roots seems like a good contribution. Dynamic global roots may be easy to add if they are otherwise similar to global roots. For the others I could print the reachable words, but I don’t know how to direct the developer to look at the relevant part of the code, especially if it’s in C code. I suppose
print_stack_reachable could be a single routine as well. Maybe that’s better.
Let me know your thoughts.