Excluding the runtime from performance measurements

As a bit of context, I recently added instruction counting support to Coq using the Linux perf API, and then starting playing around with a tool to visualize the impact of changes (e.g., to Coq or to Coq plugins) on the number of instructions it takes to run every command of a Coq source file individually.

This works quite well, but oftentimes large amounts of instructions move from one command (think one OCaml function call) to another. This is not really surprising since any small change can lead to garbage collection happening at a slightly different place in the code, but this also makes it hard to see where improvements and regressions precisely happen (especially if no impact on memory usage is expected).

One idea to fix this would be to somehow pause the instruction counter when entering the runtime, and restart it when leaving (or some variation of that based on reading the instruction counter on runtime entry and exit).

Has anyone ever done something like that? Is there support in the runtime (e.g., hooks) that I could use to implement this? Ideally I would like something that incurs a negligible, or at least reasonably low overhead.

I noticed that OCaml 5.0 has Runtime_events.Callbacks, but we are currently running OCaml 4.14. Also, I could not quickly figure out if this module would be a good fit from its documentation.