Why are exception backtraces global?

I’m working on a library to capture exceptions and upload them to Sentry, and I’m running into a lot of trouble consistently getting backtrace information, especially if an exn is returned from a function (as ('a, exn) Result.t or one of the variants of Error.t for example), or in the case of Async where the Monitor knows the real backtrace but I can’t seem to get it out of the exn.

It also makes it really annoying to write a try/finally construct containing a callback correctly (which is only possible at all since 4.05.0:

  try
    f ()
  with e ->
    let backtrace = Caml.Printexc.get_raw_backtrace () in
    do_something ()
    >>| fun _ ->
    Caml.Printexc.raise_with_backtrace e backtrace

All of this seems to be because exn’s share a global backtrace, but I’m wondering – why? It seems to make exceptions substantially less useful, and I’m not sure how bad the performance impact would be. I’m guessing if there’s a big performance hit for doing this, the compiler could continue secretly using global backtraces, and then copy them into the exception if it detects that the exception can leave the current scope (returned or used in a closure) – if you’re returning an exception then you presumably want the info in it for some reason, and you can remove the backtrace if you want. Maybe it would make the compiler more complicated, but it seems like it would substantially improve Async/Lwt exceptions, and make it harder to mess up try/finally constructs.

2 Likes

My experience with lwt was that stack traces aren’t useful when the sequence of function calls doesn’t accumulate on the stack (!).

The workaround is to resort to at least some syntax extension to capture locations at the bind points, e.g. a () >>= b would produce a trace that lets you know that you called b after you called a, and doesn’t show other things that happened in between. This is the part that’s tremendously useful, as it gives a hint as to the path that lead to some error, i.e. relevant context. Now this path doesn’t have to be complete. In the case of an infinite loop, we don’t want to accumulate a list of locations that grows forever, so it would be good to keep only the last 100 locations or so.

Where I’d like to go is that it maybe it would be good to have a generic mechanism for passing location info to infix operators like >>= without resorting to syntax extensions. Keeping an efficient and memory-safe trace would be the responsibility of the implementors of the infix operator.

Something like this, where @loc is magically passed to (>>=) as if in normal OCaml we were writing (>>=) __LOC__ a b:

let (>>=) @loc a b =
  let (trace_info, computation_a) = a in
  let trace_info = add_location trace_info loc in
  wrap_and_schedule trace_info b

Yeah, I guess the other piece of this is that stack traces really need a way for programs to edit them (so Lwt and Async could alter the stack trace to be more useful). I assume Lwt is doing the same thing as Async, where there’s a custom exception type that accumulated its own back trace, which gets you useful backtraces printed, but makes it nearly impossible for libraries like mine to inspect stack traces in a generic way (and it’s presumably less efficient than just storing one back trace on the exception instead of also storing an unusable backtrace globally).

If I understand correctly, the Lwt.catch function unwraps a resolved promise which can be either a regular value or an exception. If it’s an exception, it somehow triggers a series of artificial try-with/re-raise at the location of the original bind points, creating an actual stack trace with locations that correspond to the non-exceptional propagation of the exn value from promise to promise. Now the user code that’s given the exception to handle can print a stack backtrace and it will look just right.