Large binaries - break down the size by library?



We have observed that some of our binaries have grown in size over several releases and I’d like to find out what are the biggest contributors to this. I’d be grateful for recommendations how best to do this: can the compiler or linker be made to emit informations, or is it better to analyse the binary with something like nm(1). Obviously I can take a look at library files before linking but it seems a bit tedious to me and I don’t know how much the linker actually selects.


Some time ago I had the same interest (and still have). I have implemented an overproximation (by hooking into the link step of ocamlbuild, and gathering the byte sizes of all linked libraries). This is available here. I also wrote some text about the results over in my blog. This will obviously need some adaption for the jbuilder area. I also didn’t so far find enough time to upstream this analysis into the mirage utility (or make it more standalone).

I’d also be really interested to have a way coming from the binary, using objdump/nm to find the function names and sizes, and then some heuristics which puts function names into library buckets. Especially to see the comparison in terms of bytes of the results.


Looking at output from nm(1), the code for each module is marked:

: ring3tools $ nm -n logfreq.native | egrep '_code_begin|_code_end' | head
000000000002e500 T caml_startup__code_begin
0000000000030fd4 T caml_startup__code_end
0000000000030fe0 T camlStd_exit__code_begin
000000000003100d T camlStd_exit__code_end
0000000000031010 T camlLogfreq__code_begin
0000000000031fca T camlLogfreq__code_end
0000000000031fd0 T camlUnix__code_begin
0000000000037443 T camlUnix__code_end
0000000000037450 T camlUnixLabels__code_begin
0000000000037ccc T camlUnixLabels__code_end

While this does not account for space taken up for data, it could be a good approximation.