Debug OCaml code

As a near-term solution that offers time travel but not fully interactive debugging, I think this would be super useful. This could be solved entirely on the dune side, by providing some way to autoload printers that follow a naming convention like the one in deriving_show. To me the most annoying thing in debugging is all the work required to get ocamldebug to print something useful.

1 Like

I should mention that my use-case (a game engine) uses tsdl and Owl, and one of them (can’t remember which off-hand) simply doesn’t want to work in bytecode mode, so it’s not an option.

1 Like

This looks very promising!

I am going to try it out this week :+1:

Also… down is really nice (at least once I realised I need #use "topfind" in my ~/.ocamlinit) …turns the base ocaml shell into something that feels much more modern and friendly. Was a new discovery to me via one of the links above.

I also noticed that ocamldebug just bails out if there’s Lwt involved, and ppx_debug from above says something about no concurrency support…

But if it’s only concurrency and not parallelism then interactive debugging should still be possible?

In Python I can’t await an async function from pdb prompt, but I can set breakpoints and navigate the stack, step forward etc no prob.

Might be useful to think about the problem a bit more generally than “debugging”. Observability might be a better term, where debugging is just one way of observing the behavour of a program: there are other tools available too.

Debugging concurrent (or distributed) programs can be challenging with “printf” style debugging, but not impossible, make sure you log thread id, and have a unique identifier that identifies related pieces of computation that you prefix or include in your log messages in some form. Logs can then be post-processed to extract a per-request view of what happened (e.g. in the case of a web-server), sometimes this “post-processing” is done with grep (or ripgrep if you’ve got gigabytes to wade through).
This works (it is the main form of debugging the XAPI project…), but can be quite tedious, and can be problematic if there is more than a single request to follow (e.g. if you call into other threads or external components, or same program running on another host, etc.).

We’ve been experimenting lately with OpenTelemetry (there is an ocaml library that supports the format), and I think that might be the answer to being able to trace large, distributed (or concurrent) systems. It is an open format with various visualization tools to chose from once a trace has been recorded, and this tracing can be turned on/off at runtime, you can choose what to sample (everything, everything that originates from a certain API call, etc.), you can later on batch or post-process/filter the captured trace, etc.
It (currently) requires you to manually instrument your (API) entrypoints, and then you get a hierarchic/nested view of how an API call was handled, including all the nested API calls that one made, and the time it took for each of them (allowing you to more easily spot bottlenecks), and what logs each call made. All that is required to make that work is a library supporting the format, and modification to the code to plumb through a “context id”, which may seem quite an invasive change to the code, but may be worthwhile in the flexibility and observability you gain. Since the format is open this also works cross-language (if you have multiple components written in different languages all interacting with each-other).
When this works, it is promising to be better than a visual debugger in some sense.

Although it doesn’t allow you to break into and inspect and modify state that you didn’t add code to inspect ahead of time, quite often I find that such breakpoints would be difficult to use in a distributed system (the other part of the system would want to carry on, timeout, etc. while you’re blocked in the debugger), and some bugs/issues/race conditions can only be observed by recording the events as they happen and later inspecting it. Obviously there is the danger that recording the events prevents it from being reproducible (a heisenbug), but most bugs aren’t like that.

Beyond “printf” debugging and tracing (whether with OpenTelemetry or otherwise) there are 2 other “traditional” debugging techniques that woredk quite well with OCaml for me:

  • timer based statistical stack sampling (“flamegraphs”). If things are stuck, or just a lot slower than expected it is very useful to visualize where, and as long as you tell it to use DWARF info to unwind the stack samples it works reasonably OK with OCaml, and can at least pinpoint functions that may be unexpected bottlenecks. Although it might be useful to build something into your program that allows you to query it to tell you what “high level” operation each thread is doing, to allow you to identify things that are “stuck” more easily.
    Recently fixed a bug turning a function that used ~18% CPU into 0.02% CPU that way, buried deep in an OCaml-C binding library where noone would’ve expected to look for such bugs (the bug was in the OCaml code), and further investigation revealed that there was a correctness bug hiding behind the performance bug too (this time in the C code it was interfacing with…)

  • the new memory profiling introduced in latest versions of OCaml can sometimes point out surprising properties for your code, e.g. places where allocations happen way too often, which may not be a bug, but is a candidate for optimization/simplification and can benefit both performance and correctness (if you were not expecting that piece of code to allocate that much, what else have you missed about the code’s behaviour?)

And finally while this won’t help you debug things “in production” writing a small unit test to exercise the property you’re looking at, and then loading the code up with dune utop can help in understanding the code’s behaviour (and when you’re done you also have a regression test you can use so you don’t reintroduce the bug you’re fixing…).

If you want to take things a step further you can write a quickcheck-style property test (there are various implementations for this in OCaml: crowbar, monolith, qcheck-stm, etc.), which will find a bug with a (minimized) input, and buggy output all ready for bugfixing. This is more bug-hunting (e.g. in a particularly unstable/untrustworthy looking piece of code) than debugging.

Also when designing your program it is useful to keep in mind debuggability/observability: if this goes wrong, do I have all the information needed to debug this? Should I log something, and if I do is the log message unique enough? (using one of the FUNCTION, .etc. can help here)
If I want to turn logging on/off at runtime can I do that easily and reliably?
If I want to load this up and debug it in utop can I do that, or did I put too much logic into a single huge function where I cannot inspect intermediate state from utop?

3 Likes

I don’t think tracing is always a valid replacement to debuggers, but
it’s certainly useful, especially for performance issues.

Aside from opentelemetry (which is good for networking and distributed
systems), for local programs that are CPU heavy or interactive, like
games, I recommend Tracy (GitHub - wolfpld/tracy: C++ frame profiler) for which we
at Imandra have a small OCaml library
(GitHub - imandra-ai/ocaml-tracy: Bindings to the Tracy profiler). It’s super lightweight at
runtime and is widely used for video games performance analysis. It’s
based on manual instrumentation (no context to carry around). It can
also sample stack traces (although iirc in OCaml they’re not great).

2 Likes

Interesting. Is the recommended way to use it a Git submodule of that repository within a project? Is there anything special needed to set it up in term of dune configuration?

Interesting discussion, for sure. All these things are nice to have and use but as c-cube said, they do not replace the debugger. Most of the time you need to execute step-by-step your program, inspect the variables as they change, see the execution flow etc. It is just very handy (and time saving) to do this from inside your editor/ide.

When I found earlybird for VS Code, I assumed that I will have a dream dev environment around OCaml. But alas…

2 Likes

I don’t know what is causing this problem for you but I am able to debug in ocamldebug even if Lwt is involved.

The issue maybe multiple system threads – If I remember correctly the debugger does not like that and Lwt may have created one.

Also you may want to try OCaml 4.14 / 5.0.0~beta2 and see if the problem still occurs for you. There were some issues with older versions of ocamldebug – it simply refused to work if there were system threads. In OCaml 4.14, ocamldebug will work until a new system thread is spawned – anyways this is just my memory and I am probably wrong about some fine details.

TL;DR – use OCaml 4.14+, you should be able to debug a single (system) threaded program that uses Lwt.

Yes, it was that.

I think I was on 4.13 at the time so if there were fixes for this area in 4.14 I’ll try it again, thanks!

If I remember correctly, you just have to opam pin it with a git path, and it should work. There is no particular dune magic to use it.

It uses a dune virtual library for tracy (which is the lightweight instrumentation side, pretty low cost), which is what you can use in most of your code. Then, in your main binary, you add tracy-client as a dep and that’s where the bindings come into play, if you call Tracy.enable(). There’s an example that does exactly that.

1 Like

Does it require an OCaml compiler with frame pointers to output useful instrumentation?

It’s based on manual instrumentation, so not necessarily. You can add scopes where you want.

If the tracy process has enough permissions (e.g. ran as root) it can also capture a lot more data by itself, including stack samples, in which case I think it benefits from debug symbols. It’s a tool primarily designed for C++ and the likes. Quoting the manual:

On gcc or clang remember to specify the debugging information -g parameter during compilation and
do not add the strip symbols -s parameter. Additionally, omitting frame pointers will severely reduce the quality of stack traces, which can be fixed by adding the -fno-omit-frame-pointer parameter. Link the executable with an additional option -rdynamic (or --export-dynamic, if you are passing parameters directly to the linker).

It’s able to read OCaml debug info though; see here in the middle, it points to the line instrumented in the example program.

1 Like

Here is what I find useful.

$ cd _build/default
# run ocamldebug -- for your specific example this would be:
$ ocamldebug tests/test_oktree.bc
$ info modules

The advantage to running from within _build/default is that ocamldebug seems to find paths more easily.

Also in dune-project file I use the (wrapped_executables false) stanza. It avoids me to do weird things like break @Dune__exe__SomeModule 42 and simply do break @SomeModule 42

Thanks, I’ll try these tips!

I think they highlight that it’d be nice to have dune handle all these awkward details and “optional but kind of necessary” config hacks though

<slight_offtop>
Some time ago, I watched beginning of ‘Handmade hero’ series, where the speaker claimed that (in 2015) GNU/Linux has much less convenient debuggers than Windows. It would be great to research that and understand what may be improved in general, and maybe it would be great if OCaml will have the most powerful debugger comparatly to all available GNU/Linux languages/infrastructres.
</slight_offtop>

@Kakadu I fully agree. Not having a debugger is a show stopper, at least for people that are used to that for decades.

1 Like

The OCSF funded some of work on ocamlearlybird IIRC. So there are some resources put on the problem. But if there is no one willing to work on such a project for a long term then things won’t get better.

@Khady this OCSF funding on ocamlearlybird is a recent thing?

One question, some people mentioned that it’s possible to use gdb with OCaml, although not as convenient because you need to understand how OCaml objects are represented at runtime, and all type information is stripped away.

Having said that, there must be some graphical interfaces for gdb, even on Linux, right? Could those be used with OCaml?

2 Likes