Lwt - how to catch exceptions raised by signal handlers?

lwt

#1

I have some Lwt code that opens sockets, and I’d like to make sure that they get closed when the program is interrupted by SIGINT or SIGTERM. I’ve registered the handlers like this:

  let register_handlers () =
    Lwt_unix.on_signal Sys.sigterm (fun _ -> failwith "Caught SIGTERM");
    Lwt_unix.on_signal Sys.sigint (fun _ -> failwith "Caught SIGINT")

And the exception is succesfully raised when I press CTRL+C: (Failure "Caught SIGINT"). However, the cleanup code in the Lwt.finalize and Lwt.catch blocks does not catch this exception, and does not get executed for some reason. Is there a way of catching the exceptions raised by the handler in Lwt?

My code looks roughly like this, but I tried calling register_handlers from different places:

Lwt_unix.accept sock
>>= fun (fd, _) ->
Lwt.finalize
  (fun () -> register_handlers (); ... <handle client>)
  (fun () -> Lwt_unix.close fd)

I’d be grateful for any help you can provide :slight_smile:


#2

It does seem like a bug:

open Lwt.Infix

let () =
  Lwt_unix.on_signal
    Sys.sigint
    (fun _ ->
      failwith "Caught SIGINT")
  |> ignore;

  let p =
    Lwt.finalize
      (fun () ->
        Lwt_unix.sleep 2. >>= fun () ->
        raise Pervasives.Exit)
      (fun () ->
        prerr_endline "handled exception";
        Lwt.return ())
  in

  Lwt_main.run p

(* ocamlfind opt -linkpkg -package lwt.unix foo.ml && ./a.out *)

This behaves differently depending on whether you interrupt it with Ctrl+C, or let it raise Exit. Also, OCaml signal handling is somewhat strange. I don’t remember the full details at the moment. I opened an issue for addressing it:

https://github.com/ocsigen/lwt/issues/451


#3

Thanks a lot for reporting the issue! :slight_smile:


#4

Thank you, you reported it :slight_smile:


#5

Thanks a lot for looking into this issue! :slight_smile:
While trying to find a workaround, I realized that if I have multiple signal handlers that raise an exception, only one of them gets executed:

open Lwt.Infix

let () =
 Lwt_unix.on_signal
    Sys.sigint
    (fun _ ->
       print_endline "Caught SIGINT 1"; failwith "SIGINT 1")
  |> ignore;
  Lwt_unix.on_signal
    Sys.sigint
    (fun _ ->
       print_endline "Caught SIGINT 2"; failwith "SIGINT 2")
  |> ignore;

  let p =
    Lwt.finalize
      (fun () ->
        Lwt_unix.sleep 2. >>= fun () ->
        raise Pervasives.Exit)
      (fun () ->
        prerr_endline "handled exception";
        Lwt.return ())
  in

  Lwt_main.run p

Output:

> ocamlfind opt -linkpkg -package lwt.unix foo.ml; and ./a.out
^CCaught SIGINT 1
Fatal error: exception Failure("SIGINT 1")

When I don’t raise an exception, both handlers get executed, but the program doesn’t stop:

open Lwt.Infix

let () =
 Lwt_unix.on_signal
    Sys.sigint
    (fun _ ->
       print_endline "Caught SIGINT 1")
  |> ignore;
  Lwt_unix.on_signal
    Sys.sigint
    (fun _ ->
       print_endline "Caught SIGINT 2")
  |> ignore;

  let p =
    Lwt.finalize
      (fun () ->
        Lwt_unix.sleep 2. >>= fun () ->
        raise Pervasives.Exit)
      (fun () ->
        prerr_endline "handled exception";
        Lwt.return ())
  in

  Lwt_main.run p

Output:

> ocamlfind opt -linkpkg -package lwt.unix foo.ml; and ./a.out
^CCaught SIGINT 1
Caught SIGINT 2
handled exception
Fatal error: exception Pervasives.Exit

So I think I have to raise an exception from the handler, to stop the program. So this means that I can have only one handler. So I came up with the following really ugly workaround, which seems to work:

  let cleanups = Hashtbl.create 4;;

  let cleanup_id = ref 0;;

  let register_cleanup_handler () =
    let handler s = Hashtbl.iter (fun _ c -> c ()) cleanups; failwith "Caught signal %s" s in
    Lwt_unix.on_signal Sys.sigterm handler |> ignore;
    Lwt_unix.on_signal Sys.sigint handler |> ignore

  let with_cleanup f cleanup =
    let i = !cleanup_id in
    incr cleanup_id;
    Hashtbl.add cleanups i cleanup;
    Lwt.finalize f (fun () ->
          Lwt_io.print "unregistering handler\n" >>= fun () ->
          Lwt.wrap (fun () -> Hashtbl.remove cleanups i) >>= fun () ->
          Lwt_io.print "unregistered handler\n")

The only issue is that I cannot use Lwt functions in the handler and in the cleanup functions :confused: .


#6

I investigated the issue. Details, proposal for change, and workaround available on this Github comment:

https://github.com/ocsigen/lwt/issues/451#issuecomment-325554763

Note that signal handlers can’t be made to automaticallly send an exception wherever you want because Lwt has no way of knowing where it is appropriate. There is a workaround using waiters. See comment linked above.


#7

Thanks for looking into the issue! I’ve realized that the example in my original post isn’t a good one, because both Linux and Windows automatically close open file descriptors, therefore I don’t have to worry about cleaning these up when the program terminates, I only have to ensure that the program isn’t leaking open file descriptors while it’s running.

However, I’ll probably still need to free some resources when the program is stopped with a SIGTERM. I’ve realized that the situation becomes more complicated sadly when I have two nested finalizers.

  • If I use the wakener only in the inner block, then both cleanups get executed in all cases. However, when I press CTRL+C immediately after startup, I still have to wait for the first sleep to finish - what I originally wanted is to terminate the program immediately and run all the cleanups in all finalizers.
  • If I use the wakener with the outer block, the cleanup for the inner Lwt.finalize does not get executed.
  • If I use the same wakener for both finalize blocks, then it works as I intended - it exits immediately and calls both cleanup functions:
(* compile with
   ocamlfind ocamlc -o prog -linkpkg -package lwt -package lwt.unix prog.ml
*)
open Lwt.Infix

let () =
  let w,u = Lwt.wait () in
  Lwt_unix.on_signal
    Sys.sigint
    (fun _ ->
       Lwt.wakeup_exn u (Failure "Caught SIGINT"))
  |> ignore;

  let p =
    Lwt.finalize
      (fun () ->
         (Lwt_io.eprintl "block 1" >>= fun () ->
          Lwt_unix.sleep 2. >>= fun () ->
          Lwt.finalize
            (fun () ->
               (Lwt_io.eprintl "block 2" >>= fun () ->
                Lwt_unix.sleep 2. >>= fun () ->
                raise Pervasives.Exit)
               <?>
               w
            )
            (fun () -> Lwt_io.eprintl "cleanup 2")
          >>= fun () ->
          Lwt_unix.sleep 2.)
         <?>
         w
      )
      (fun () -> Lwt_io.eprintl "cleanup 1")
  in

  Lwt_main.run p

So it seems that using Lwt.wait allows one to have more control over what actions the program does before it terminates and at what point it terminates. But in my case, I only want to free up some resources before the program exits. So another workaround I was thinking of using is the one I mentioned above: using a global reference to keep track of the objects to be cleaned up. This workaround is less elegant and uses mutability, but I understand the control flow better in this case:

open Lwt.Infix

module StringSet = Set.Make(String)
let cleanups = ref StringSet.empty

let () =
  Lwt_unix.on_signal
    Sys.sigint
    (fun _ ->
       StringSet.elements !cleanups
       |> Lwt_list.iter_s (fun e -> Lwt_io.eprintlf "cleanup %s" e)
       |> Lwt_main.run;
       failwith "SIGINT"
    )
  |> ignore;

  let p =
    Lwt.finalize
      (fun () ->
         Lwt_io.eprintl "finalizer 1" >>= fun () ->
         cleanups := StringSet.add "1" !cleanups;
         Lwt_unix.sleep 2. >>= fun () ->
         Lwt.finalize
           (fun () ->
              cleanups := StringSet.add "2" !cleanups;
              Lwt_io.eprintl "finalizer 2" >>= fun () ->
              Lwt_unix.sleep 2. >>= fun () ->
              raise Pervasives.Exit)
           (fun () -> Lwt_io.eprintl "cleanup 1" >|= fun () ->
             cleanups := StringSet.remove "1" !cleanups)
         >>= fun () ->
         Lwt_unix.sleep 2.
      )
      (fun () -> Lwt_io.eprintl "cleanup 2" >|= fun () ->
        cleanups := StringSet.remove "2" !cleanups)
  in

  Lwt_main.run p

I think both of these workarounds can be used in combination with Lwt.finalize to create a helper that ensures that the cleanups are run when the program is interrupted. I think Lwt.pick instead of Lwt.choose (<?>) is more suitable for the workaround using Lwt.wait, as it cancels the other thread, but I may be wrong - it seems to get cancelled anyway somehow.


#8

Actually I’ve realized that using Lwt.pick instead of Lwt.choose on the main thread does work as expected, the program is terminated immediately, and all the finalizers get executed:

open Lwt.Infix

let () =
  let w,u = Lwt.wait () in
  Lwt_unix.on_signal
    Sys.sigint
    (fun _ ->
       Lwt.wakeup_exn u (Failure "Caught SIGINT"))
  |> ignore;

  let t () =
    Lwt.finalize
      (fun () ->
         Lwt_io.eprintl "block 1" >>= fun () ->
         Lwt_unix.sleep 2. >>= fun () ->
         Lwt.finalize
           (fun () ->
              Lwt_io.eprintl "block 2" >>= fun () ->
              Lwt_unix.sleep 2. >>= fun () ->
              raise Pervasives.Exit)
           (fun () -> Lwt_io.eprintl "cleanup 2")
         >>= fun () ->
         Lwt_unix.sleep 2.
      )
      (fun () -> Lwt_io.eprintl "cleanup 1")
  in
  let t () = Lwt.pick [t (); w] in
  Lwt_main.run (t ())

Output:

$ ./prog 
block 1
^Ccleanup 1
Fatal error: exception Failure("Caught SIGINT")
$ ./prog 
block 1
block 2
^Ccleanup 2
cleanup 1
Fatal error: exception Failure("Caught SIGINT")
$ ./prog 
block 1
block 2
cleanup 2
cleanup 1
Fatal error: exception Pervasives.Exit
$

It also works when I combine threads with Lwt.join:

open Lwt.Infix

let () =
  let w,u = Lwt.wait () in
  Lwt_unix.on_signal
    Sys.sigint
    (fun _ ->
       Lwt.wakeup_exn u (Failure "Caught SIGINT"))
  |> ignore;

  let t () =
    Lwt.finalize
      (fun () ->
         Lwt_io.eprintl "block 1" >>= fun () ->
         Lwt_unix.sleep 2. >>= fun () ->
         let t2 () =
           Lwt.finalize
             (fun () ->
                Lwt_io.eprintl "block 2" >>= fun () ->
                Lwt_unix.sleep 2.)
             (fun () -> Lwt_io.eprintl "cleanup 2")
         in
         let t3 () =
           Lwt.finalize
             (fun () ->
                Lwt_io.eprintl "block 3" >>= fun () ->
                Lwt_unix.sleep 2.)
             (fun () -> Lwt_io.eprintl "cleanup 3")
         in
         Lwt.join [t2 (); t3 ()] >>= fun () ->
         Lwt_unix.sleep 2.
      )
      (fun () -> Lwt_io.eprintl "cleanup 1")
  in
  let t () = Lwt.pick [t (); w] in
  Lwt_main.run (t ())

Output:

$ ./prog 
block 1
^Ccleanup 1
Fatal error: exception Failure("Caught SIGINT")
$ ./prog 
block 1
block 3
block 2
cleanup 3
cleanup 2
^Ccleanup 1
Fatal error: exception Failure("Caught SIGINT")
$ ./prog 
block 1
block 3
block 2
^Ccleanup 3
cleanup 2
cleanup 1
Fatal error: exception Failure("Caught SIGINT")
$

I also noticed that when a background thread isn’t connected to the main one (e.g. uses let _ = ...), its finalizer do not get executed at all when I use this method:

open Lwt.Infix

let () =
  let w,u = Lwt.wait () in
  Lwt_unix.on_signal
    Sys.sigint
    (fun _ ->
       Lwt.wakeup_exn u (Failure "Caught SIGINT"))
  |> ignore;

  let t () =
    Lwt.finalize
      (fun () ->
         Lwt_io.eprintl "block 1" >>= fun () ->
         Lwt_unix.sleep 2. >>= fun () ->
         let _ =
           Lwt.finalize
             (fun () ->
                Lwt_io.eprintl "block 2" >>= fun () ->
                Lwt_unix.sleep 2.)
             (fun () -> Lwt_io.eprintl "cleanup 2")
         in
         Lwt_unix.sleep 2.
      )
      (fun () -> Lwt_io.eprintl "cleanup 1")
  in
  let t () = Lwt.pick [t (); w] in
  Lwt_main.run (t ())

Output:

# all finalizers are executed after normal termination:
$ ./prog 
block 1
block 2
cleanup 2
cleanup 1
$ ./prog 
block 1
^Ccleanup 1
Fatal error: exception Failure("Caught SIGINT")
$ ./prog 
block 1
block 2
^Ccleanup 1
Fatal error: exception Failure("Caught SIGINT")
$

When I tried to use this method for a more complicated program with a recursive Lwt thread, it did not work, not even when I joined all the background threads together after the recursive thread :frowning: , so I still had to go for the ugly workaround that I described above.


#9

UPDATE: I’ve found an issue with the ugly workaround I described above (using signal handlers that stop the program by throwing an exception):
When systemd terminates the program using SIGTERM, it will output a backtrace, and will appear to be in the “failed” state. I tried wrapping the whole program with a try/with block to suppress the exception, but it didn’t work sadly. I’ve also tried setting async_exception_hook to a no-op, but that didn’t work either.

I’ve also experimented with using Lwt_main.at_exit instead of signal handlers for running to cleanup code:

(* compile with
   ocamlfind ocamlc -o prog -linkpkg -package lwt -package lwt.unix prog.ml
*)
open Lwt.Infix

module StringSet = Set.Make(String)
let cleanups = ref StringSet.empty

let () =
  Lwt_main.at_exit
    (fun () ->
       StringSet.elements !cleanups
       |> Lwt_list.iter_s (fun e -> Lwt_io.eprintlf "cleanup %s" e)
    );

  let p =
    Lwt.finalize
      (fun () ->
         Lwt_io.eprintl "finalizer 1" >>= fun () ->
         cleanups := StringSet.add "1" !cleanups;
         Lwt_unix.sleep 2. >>= fun () ->
         Lwt.finalize
           (fun () ->
              cleanups := StringSet.add "2" !cleanups;
              Lwt_io.eprintl "finalizer 2" >>= fun () ->
              Lwt_unix.sleep 2. >>= fun () ->
              raise Pervasives.Exit)
           (fun () -> Lwt_io.eprintl "cleanup 1" >|= fun () ->
             cleanups := StringSet.remove "1" !cleanups)
         >>= fun () ->
         Lwt_unix.sleep 2.
      )
      (fun () -> Lwt_io.eprintl "cleanup 2" >|= fun () ->
        cleanups := StringSet.remove "2" !cleanups)
  in

  Lwt_main.run p

But it seems that the function registered with at_exit does not get called when the program receives a signal (for example SIGINT after pressing CTRL+C):

$ ./prog 
finalizer 1
finalizer 2
cleanup 1
cleanup 2
Fatal error: exception Pervasives.Exit
$ ./prog
finalizer 1
^C⏎        
$