[ANN] A tool to reverse debug OCaml/other binary runs

sid · March 24, 2025, 1:08pm

I’d like to announce a debugging tool I’ve built ! It’s called Software Counters mode rr .

It is available at GitHub - sidkshatriya/rr.soft: Use the rr debugger without HW performance counters !

Many of you may have already heard of a debugger called rr – it allows you to record and replay programs on Linux. It is extremely useful for instance to debug issues with garbage collection or other low level issues in natively compiled OCaml programs. Once you capture a bug during the record phase, that bug can be replayed any number of times during replay.

One major limitation of rr is that it requires access to CPU Hardware Performance counters which is usually not available in cloud VMs or containers. Sometimes HW counters can be unreliable/high latency for some CPUs (e.g. Zen) or it could just be difficult to get them working for your particular configuration.

Software Counters mode rr is a modification of the rr debugger that lifts this limitation – access to CPU Hardware Performance counters is not required. This means you can run rr in many more configurations.

I’ve been able to successfully record/replay the whole OCaml compiler test suite using Software Counters mode rr (Except for a single ocaml test called pr2195 which exhausts the file descriptors).

I’ve also written a blog post about record/replay debugging generally and Software Counters mode rr in particular. Please see here.

kayceesrk · March 24, 2025, 3:04pm

This is very cool. I’ve been a heavy rr user. I appreciate the work to get the software mode working. How slow is the software mode compared to the hw mode in practice? Are there other limitations?

sid · March 24, 2025, 3:56pm

Thank you !

Software counters mode is definitely slower due to overheads from dynamic instrumentation. Some of that overhead can be managed by using (optional) static instrumentation via custom C/C++ compiler plugins (which are provided in the repo).

But given that you would probably be using this with OCaml compiler compiled binaries rather than C/C++ binaries the dynamic instrumentation overhead is probably unavoidable.

I would love it if you were able to try it out sometime and tell me how it went – Is the performance acceptable for your use case ? Is the debugging experience reliable/robust ?

As far as limitations – in general, with dynamic instrumentation, things can be more fragile apart from being slower.

But when HW counters are not available the Software Counters mode could be better than no option at all. Additionally on some CPUs Software Counters mode might be a good idea to try out also (On Zen CPUs the HW counters can sometime be unreliable).

Also currently I don’t not instrument JITed code (but nevertheless in most cases debugging should work fine when JITs are present also). Given that OCaml does not do JIT this should not be a big concern.

One of additional limitations of Software Counters mode rr is that it currently needs to be used in a mainstream Linux distribution (fedora / ubuntu / debian unstable) due to a need for good debuginfod support. See Home · sidkshatriya/rr.soft Wiki · GitHub for detailed information for why this is the case and other useful information !

lambda_foo · March 24, 2025, 9:56pm

Adding this to my toolkit, awesome work.

Do you think ARM64 support will be difficult to get working? When I last looked rr needed Linux to correctly identify the underlying CPUs (e.g. Apple Silicon M3Pro) to setup hardware performance counters. At the time only M1/M2 were supported by Linux.

sid · March 24, 2025, 10:11pm

aarch64 support exists for Software Counters mode rr !

I am for instance able to run rr in Linux VMs on apple silicon macOS machines which I think is very cool (if I say so myself !). There are still some rough edges that need to be resolved but it generally works well.

I’ve just not made the code public for aarch64 currently – I’m trying to figure out my strategy for that. As a start I’ve released x86-64 version to the world.

sid · May 5, 2025, 4:25pm

BTW aarch64 support was released – forgot to mention it on this thread.

sid · May 8, 2025, 5:11am

BTW this limitation was also removed some time ago i.e. debuginfod support in your Linux distribution is not needed.

A flake.nix is provided so you can get up and running very quickly in case you don’t want to perform the build instructions manually.

TL;DR Software Counters mode can run on any recent aarch64/x86-64 Linux distro.

sid · May 23, 2025, 12:52pm

I’ve written a tutorial on using both rr and its Software Counters mode variant.

See me/009-rr-on-aarch64.md at master · sidkshatriya/me · GitHub

The tutorial starts from the basics and covers quite a lot of ground. I would love feedback on the blog post.

If you’re interested in reverse debugging please check it out !

@kayceesrk Since you’ve used rr quite a lot, I would particularly appreciate your feedback, should you have the time

Topic		Replies	Views
[Tutorial] Wrote a Record/Replay Debugging tutorial Learning dune	0	80	May 26, 2025
Current state of OCamldebug Ecosystem debugging , ocamldebug	2	1282	April 4, 2019
Debug OCaml code Learning debugger	47	3585	December 16, 2022
Bytecode debugging in OCaml 5.3 Ecosystem	7	388	February 26, 2025
[ANN] earlybird 1.2.0 – revival of a debugger Ecosystem announce , debugging , debugger	15	1544	June 6, 2025

[ANN] A tool to reverse debug OCaml/other binary runs

Related topics