Catching panics in Rust

I’m doing a lot of FFI work with OCaml and Rust (OCaml using Rust code) and I’m running into some issues that are hard for me to explain (due to my lack of knowledge of the internals).

Perhaps someone could think of something here, or know what I should read to understand these things more.

So, weird things arise when I panic in Rust, and I don’t really get clean stack traces. What I’ve been trying to do is move panics to the OCaml side, and raise exception there when I can instead. But it doesn’t always work as sometimes there’s a bug and a panic is still there. In these cases I want to be able to investigate and debug to find the source of things.

My solutions have been to try to either:

  1. catch the panic in Rust, and do something with it. (Better backtrace when Rust panics · Issue #117 · zshipko/ocaml-rs · GitHub)
  2. create an error in Rust that contains a backtrace. (using thiserror - Rust)

but both ways end up giving me weird ABRT signals and

" ```
fatal runtime error: failed to initiate panic, error 5

I'm guessing there's something fundamentally wrong about trying to play with Rust's panics in an OCaml binary, but I'm not sure.

EDIT: catching a panic looks like this from OCaml:

thread panicked while processing panic. aborting.
File “src/lib/pickles/dune”, line 2, characters 1-15:
2 | (inline_tests)
Command got signal ABRT.

Is it actually kosher to catch a panic? I thought that Rust gave few guarantees about the state of the runtime when a panic happens – and that you were meant to abort the program and never try to recover ?

1 Like

Good question. (Note that in the case where I’m just creating a backtrace I’m not even trying to catch a panic.)

FYI I’ve used this API from Rust: set_hook in std::panic - Rust

I’ll read more about it now.

EDIT: it looks like the default panic runtime is “unwind”, which will cleanly unwind and free memory, but you can also set panic_abort which will not. In my case I’m wondering if the OCaml panics and the Rust panics are colliding somehow.

In some cases, panics are not set to unwinding by default but aborting (e.g. in case of cross-compilation). Your issue sounds similar to rust - catch_unwind signal SIGABRT when unwrap a Result inside it - Stack Overflow. This post does not have a solution, but does explicitly setting panics to unwinding help?

As for OCaml exceptions and Rust panics colliding, this is possible in principle (but I am not sure this happens in your example). Unwinding across OCaml frames or raising OCaml exceptions across Rust frames is undefined behaviour. You have to convert between exceptions and panics at boundaries.[1]

  1. Ideally, this should be handled by the FFI package (ocaml-rs, ocaml-interop). Moreover, I think that round-trip conversion between exceptions and panics is possible with more effort (conditional on an appropriate use of exceptions on the OCaml side), but for simple error reporting this is probably not needed. ↩︎

1 Like

Panics can be caught, this comes from the time the exception-handling story of Rust was inspired by Erlang (let it fail—a thread should be able to fail without disturbing the rest of the program). panic::catch_unwind has the UnwindSafe bound on its closure argument, which is meant to ensure that any escaping value remains in a consistent state in case of panic. Though this bound does not apply to the case of catching panics at FFI boundaries, for which there are many risks.

I have no experience with catching panics in Rust, but my limited understanding of catching panics in Rust is:

  • all bets are off
  • print some error msg, and terminate

I think the solution here is to engineer the Rust side to return a Result instead of trying to catch panics.

Yes, that is my assessment, too.

I’m wondering if during unwinding Rust is also trying to free some memory that’s still alive from OCaml’s perspective…

I’m also wondering why Rust seems to be “all bets are off” if you can also cleanly catch panics in Rust

There are other nasty things besides this (so this is a lower bound on the danger of catching panic; not the only danger of catching panic) – the example I often read about is this.

  1. we acquire a Arc<Mutex<...>>
  2. we panic
  3. we catch the panic

What do we do now with the Mutex ?

If we release it, well, now the system is in some inconsistent state / we broke assumptions about the Mutex.

If we continue to hold it, we’re probably going to dead lock something.

At this point, it seems like we might as well as terminate the program.

Again, I’m not saying this is the only bad thing from catching mutex; just one of many things that can go wrong.

My understanding is that they poison the mutex at this point, and anybody who tries to unwrap it will then panic (or they can try to handle a poisoned mutex more gracefully)

Looking at the issue you filed, in the first case it seems that you replaced the ocaml-rs panic hook with your own. OCaml-rs used to convert panics into exceptions at the boundary, but this seems to have been changed to directly raising an OCaml exception from the panic hook. (I do not think this is a good thing to do, but this is off-topic for this thread.) Thus when you replaced the panic hook, you changed the panic behaviour to continue with normal unwinding and thus started unwinding OCaml frames (which is undefined behaviour), since ocaml-rs no longer had other protections against panics.

As advised by Zach in the issue above, I would try calling the backtrace crate from the ocaml-rs panic handler, that is without changing the existing behaviour. (In the longer term it would be good to go back to re-raising the panic as an exception at the moment of returning to OCaml rather than from the panic hook, but this is another topic.)

1 Like

Personally, if I am not start enough to prevent panics; I am certainly not smart enough to reason about what will continue to work / break after panic, when there are poisoned Mutex among other things.

Thus, in theory, I agree with you that you could carefully reason about post-panic code; in practice, I view it as all bets are off (with exception of printing some error msgs) – because if I’m not smart enough to prevent the panic, I’m probably not smart enough to reason about post-panic code.

As the discussion revolves around FFI work with OCaml and Rust, catching panics in Rust is indeed relevant and can be done safely when the panic is converted into an exception at the boundary. Of course, it is essential that the OCaml code takes these exceptions just as seriously as a panic. It is in fact instructive for us to correctly understand the panic mechanism as it teaches a lot about how one can write exception-safe OCaml code.

You raise valid concerns about reasoning about post-panic code in the general case, but one of the main reasons behind panics is to simplify dealing with post-panic situations. It can actually be easier to reason about post-panic code than figuring out all the possible sources of panic. By using Rust’s mechanisms, such as destructors to clean up state and the UnwindSafe trait, one can be more confident in the safety and consistency of the code after a panic. The key is to choose a suitable location for recovery, one where reasoning about the state of the program is manageable.

Think of it like force-closing a process in an operating system. The process may not have planned for every crash scenario, but the OS has mechanisms to safely close it and clean up resources without destabilizing the whole system.

Panics and Result have distinct roles in Rust, each with specific use cases. Replacing panics with Result is not a solution, as it would imply having to consider all possible error scenarios. External factors, like using an FFI, should not force the use of one error-handling mechanism over the other.

1 Like

I’m disengaging from this discussion. I will leave with two links, from “the book”

It is pretty clear to me: panic are NOT exceptions; when a library panics, it is the library author saying “I give up”; catching panics & resuming execution as if everything is normal … is insanity to me.

1 Like

your first link does say that it will unwind cleanly though. I think the language of “unrecoverable” is due to panics not being treated as exceptions in Rust. As in, they’re heavily discouraged, but it seems like you could be writing a safe erlang-like supervisor/agent model in Rust via catching panics

Indeed, Erlang was an inspiration for Rust’s panic model. For sources that explain the notion of panic safety maybe better than I did, you can check out the Nomicon and the documentation for UnwindSafe.

Thanks for the valuable discussion, as I believe this topic is crucial for the robust handling of serious exceptions in multicore OCaml too. These aspects are often overlooked, but they represent another aspect of Rust that seeks to bring the qualities of pure code to imperative programming. I appreciate the scepticism, and I would always be happy to continue the discussion.