I was trying to benchmark things myself the day before this was posted, so I thought I’d share some things.
I was doing the perf record --call-graph=dwarf
+ profiler.firefox.com option. But right off the bat, 25% of the time was apparently spent in functions whose callstack was messed up, just one frame not descending from the main function. And of course, profiles are large and super slow to compute even for executions of 1-2s, because dwarf.
I ended figuring out this way of getting a compiler with frame pointers, re-building all the libraries with it, and asking dune to use it (after fruitlessly searching for documentation and a fair amount of confusion):
$ opam switch create 5.1.1+fp ocaml.5.1.1 ocaml-option-fp && eval $(opam env)
$ ocamopt -config | grep -i frame # should be true
# might need to throw in a opam reinstall ocaml-config ocaml if that comes out "false"?
# not sure if I got confused or what
$ opam install --deps-only --switch 5.1.1+fp --locked ./$package.opam
$ eval $(OPAMSWITCH=5.1.1+fp opam env)
$ dune build
That solved the messed-up callstack problem (99% of the time is properly reported). It also fixes the slowness (perf script takes 0.5s instead of 45s before).
The missing line numbers is clearly a problem. The suggestion in the post sadly doesn’t work for me: the perf command is unusably slow (it seems to be calling /usr/bin/addr2line as fast as possible. For every stack maybe? no idea). I’ve been making do with this sort of things:
$ gdb ../../_build/default/path/my.exe <<< 'break camlMylib__Mymodule.fun_9651' | grep -m 1 '(gdb)'
(gdb) Breakpoint 1 at 0x612290: file path/my.ml, line 416.
On the scale of fun, I would probably not give my workflow a 9651…
I had also tried magic-trace with no success. In normal intel processor trace mode, the control flow is so confused that it’s not usable (maybe because of ocaml 5? There should be very little effects, and none of it interleaved with the computation of interest. I didn’t test with ocaml 4. I don’t think I’m using exceptions for control flow). In sampling mode, the profiles might be meaningful, but all the symbols come out as [unknow]
.
Overall, I was able to profile, but I’d say it’d be useful to improve https://v3.ocaml.org/docs/profiling to contain the kind of information in this thread. Probably renaming that page from “profiling” to “understanding the runtime”, since it hardly talks about profiling, and making a new profiling page from the bit of profiling information from the initial page and this thread.
More generally, IMO it’d also be easier if it wasn’t necessary to interact with opam here. I just want to say dune build --ocaml-config=fp
, and that could internally ask opam to grab stuff as necessary. But there seems to be a mismatch between what I want and what opam wants.
As a user mostly unfamiliar with the open source ocaml ecosystem, I want the source of truth for which packages should be available to be my dune files + my lock file (i.e. no concept of opam switch or opam package file is necessary. I thought the dune opam file generation might do some of that, but it seems to require you to go through your dune files to find dependencies for some reason). opam would simply manage a cache of these things.
Whereas opam seems to really want switches to be a user facing concept, which has an identity/state, and that you can sync explictly with your own state using opam lock
, opam install --locked
(and also opam switch import/export
, which I guess are not the same thing?).