Profiling OCaml programs

Ahrefs recently had to do some profiling to investigate build times. @jchavarri used this opportunity to regroup different techniques in a single document, which was then turned into this blog post.

The post focuses on dune, as it was our target this time. But it applies to basically all ocaml programs.

We hope that it can serve as a starting point if you ever have to conduct similar investigations.

16 Likes

Nice, thanks for sharing this. It’s very useful to have that well-structured and detailed overview!

Do you already have some results of your investigation of build time bottlenecks? In the last section, about Dune tracing tools, you more or less insinuate that you’re mostly interested in analyzing dune’s internal work and not that much in dune’s subprocesses. Does that mean that sub-processes -such as PPX application, or anything contained in the compilation itself, such as reading build artifacts- don’t seem significant so far? Or is it mostly that they’d be harder to optimize?

Btw, as a side-question, just out of curiosity: Have you also happened to try magic-trace?

Our first goal was to look at no-op builds, where dune has nothing to do but create the list of rules and check that they are up to date. It is a necessary step to improve the workflow of our developers when compiling large projects. So we haven’t really looked at how much time was spent in external processes such as ppx or ocamlopt.

We haven’t tried magic-trace. First because our dev servers do not have intel cpu. And second because it seems much more useful to create traces of very specific, short run, parts of a program. That would require us to first have identified parts of the dune codebase that could be the culprit.

The investigation didn’t go very far as we quickly learnt that janestreet is already working on optimising some of dune’s internals. We decided to pause our work until we see what they release.

In general I would say that it’s hard to understand what makes a build slow (when it’s not a no-op). Time spent in dune internals, time spent running preprocessing, time spent running the ocaml compiler, time during which parallelism is very low because a large module is a bottleneck, extra recompilations because of non optimal project layout, linking of the final binary, … In our specific case I can’t say if halving the time spent running ppx would have any impact on the compilation time as seen but the developers.

1 Like

I was trying to benchmark things myself the day before this was posted, so I thought I’d share some things.

I was doing the perf record --call-graph=dwarf + profiler.firefox.com option. But right off the bat, 25% of the time was apparently spent in functions whose callstack was messed up, just one frame not descending from the main function. And of course, profiles are large and super slow to compute even for executions of 1-2s, because dwarf.

I ended figuring out this way of getting a compiler with frame pointers, re-building all the libraries with it, and asking dune to use it (after fruitlessly searching for documentation and a fair amount of confusion):

$ opam switch create 5.1.1+fp ocaml.5.1.1 ocaml-option-fp && eval $(opam env)
$ ocamopt -config | grep -i frame # should be true
# might need to throw in a opam reinstall ocaml-config ocaml if that comes out "false"?
# not sure if I got confused or what
$ opam install --deps-only --switch 5.1.1+fp --locked ./$package.opam
$ eval $(OPAMSWITCH=5.1.1+fp opam env)
$ dune build

That solved the messed-up callstack problem (99% of the time is properly reported). It also fixes the slowness (perf script takes 0.5s instead of 45s before).

The missing line numbers is clearly a problem. The suggestion in the post sadly doesn’t work for me: the perf command is unusably slow (it seems to be calling /usr/bin/addr2line as fast as possible. For every stack maybe? no idea). I’ve been making do with this sort of things:

$ gdb ../../_build/default/path/my.exe <<< 'break camlMylib__Mymodule.fun_9651' | grep -m 1 '(gdb)'
(gdb) Breakpoint 1 at 0x612290: file path/my.ml, line 416.

On the scale of fun, I would probably not give my workflow a 9651…

I had also tried magic-trace with no success. In normal intel processor trace mode, the control flow is so confused that it’s not usable (maybe because of ocaml 5? There should be very little effects, and none of it interleaved with the computation of interest. I didn’t test with ocaml 4. I don’t think I’m using exceptions for control flow). In sampling mode, the profiles might be meaningful, but all the symbols come out as [unknow].

Overall, I was able to profile, but I’d say it’d be useful to improve https://v3.ocaml.org/docs/profiling to contain the kind of information in this thread. Probably renaming that page from “profiling” to “understanding the runtime”, since it hardly talks about profiling, and making a new profiling page from the bit of profiling information from the initial page and this thread.

More generally, IMO it’d also be easier if it wasn’t necessary to interact with opam here. I just want to say dune build --ocaml-config=fp, and that could internally ask opam to grab stuff as necessary. But there seems to be a mismatch between what I want and what opam wants.
As a user mostly unfamiliar with the open source ocaml ecosystem, I want the source of truth for which packages should be available to be my dune files + my lock file (i.e. no concept of opam switch or opam package file is necessary. I thought the dune opam file generation might do some of that, but it seems to require you to go through your dune files to find dependencies for some reason). opam would simply manage a cache of these things.
Whereas opam seems to really want switches to be a user facing concept, which has an identity/state, and that you can sync explictly with your own state using opam lock, opam install --locked (and also opam switch import/export, which I guess are not the same thing?).

4 Likes

For my part, I have used the landmarks package and PPX quite a bit. It’s probably not as powerful as those mentioned here but its easy to use (if your project uses dune), and can be limited to calls to some subset of your code easily. It doesn’t track calls to the GC though.

The short tutorial:

  1. Install landmarks: opam install landmarks landmarks-ppx
  2. Add it to the dune files of the code you want profiled:
    (...
      (preprocess (pps ... landmarks-ppx --auto))|
      (libraries ... landmarks))|
    
  3. Build the project: dune build
  4. Run it with shell variable OCAML_LANDMARKS="format=json,output=profile.json"
  5. Open the generated output file in the Landmarks viewer

I learned of it through this blog post: Profiling OCaml programs the quick and dirty way | Tianyi Song

2 Likes