Forking in UTop not behaving as expected

I am trying to understand why utop hangs when calling the following test_fork function:

let print_pid name =
  Printf.printf "%s PID: %i\n" name (Unix.getpid ()) ;
  flush stdout

let test_fork () =
  print_pid "top" ;
  match Unix.fork () with
  | 0 -> (
      print_pid "child" ;
      match Unix.fork () with
      | 0 ->
          print_pid "grand child" ;
          let shell = "/bin/sh" in
          ignore (Unix.setsid ()) ;
          Unix.execv shell [|shell; "-c"; "ls"|]
      | _ -> exit 0 )
  | _ -> ignore (Unix.wait ())

The child’s call to exit does not seem to happen and the process is listed through ps -ax. So the parent top just hangs on the wait. There seems to be a race condition as well. If I add some code before the child exit then it succeeds, although I still get an exception Exception: Unix.Unix_error(Unix.EINTR, "wait", "") as if I manually killed the child process.

If I run the compiled code in a bytecode runtime, things seem to behave more or less as expected.

Additionally the grand child’s call to Unix.setsid does not detach it from the terminal so the shell call still outputs to the current terminal. This is independent of whether running in utop or not.

I am on a MacBook with utop version 2.13.1 (using OCaml version 5.1.0).

Oh and if your system hangs as well kill -9 the printed child pid should get you out of the jam. You may want to do the same to the grand child also.

1 Like

Out of curiosity, what version of macOS are you running?

I have an M1 mac with Monterey version 12.6.7.

I don’t have any answers, but:

  1. I’m glad you checked a toplevel executable. This shows that it’s not an OCaml problem, but rather a toplevel problem.

  2. Does your bug reproduce with the regular OCaml toplevel? That would narrow down whether it’s a toplevel, or utop problem?

Just random ideas for how to eliminate variables.

Good idea! I tried with plain Ocaml thanks to #use_output "dune ocaml top". It also hangs but the child and grand child did both exit. So I assume the parent is hanging with Unix.wait. It matched the bytecode executable. (I misread the output in my first edit). So it does appear to be UTop specific.

What do you make of setsid call not detaching from the terminal as promised in the Unix module documentation?

On Linux I would have already reached for strace to ascertain whether the parent was hanging on exit or not. You’re on a Mac (so xBSD), and IIUC you can use truss to do the same? Just to verify that we understand what’s going on.

Actually plain OCaml top appears to work.

OK, so a bytecode executable works, and so does ocaml (top). Only utop doesn’t work. That’s somewhat not-surprising, right? Utop is pretty visual, and I’d be shocked to learn that it didn’t manipulate the terminal, process groups, etc, etc. So if you really want this to work, I think the next thing is to figure out what it’s doing in that regard. strace/truss might help here. But also, do you really need to fork() twice? Can’t you fork/exec and be done with it?

Part of why I ask that, is that I wouldn’t be surprised to learn that there’s stuff going on at exit. So in the child process, your exit() call is going to run that stuff. At a minimum, you might want to switch to a lower-level exit call: I remember there’s a lower-level one in C that doesn’t run any of the atexit() hooks, and maybe that’d help.

But really, you’re going to have to find out what utop does that’s special.

For now I am choosing to manually launch the client program instead of forking from utop. It’s a bit cumbersome but at least it works. I will look into the lower level exit call as well.

From OCaml-4.12 this is wrapped as Unix._exit, and should be used instead of exit for any but the last process to finish. The other thought is whether the test code starts any Thread module threads, or utop does so in the process it is managing. If so, Unix.fork won’t work correctly: from the prospective documentation in OCaml-5.2: “[Unix.fork] fails if the OCaml process is multi-core (any domain has been spawned). In addition, if any thread from the Thread module has been spawned, then the child process might be in a corrupted state.”

2 Likes

Thanks. It makes sense to use Unix._exit per the document, which seems to produce a reliable exit. However the exit is still abnormal so the wait produces Exception: Unix.Unix_error(Unix.EINTR, "wait", ""). I had to wrap the wait with a try block. At least now I can limp forward, more conveniently than manually launching the grand child, though I still need a way to detach it from the terminal.

For the curious this pattern is used in the OCaml debugger to launch the debuggee as a client. The wait is necessary so that the call to accept by the parent is after the client open_connection. (My understanding, though it is still not guaranteed I guess).

The test_fork alone is enough to trigger the behavior described. I didn’t use Thread.

Note that EINTR is not an abnormal failure. Indeed, any blocking system call can spuriously throw EINTR, in which case it should be retried in a loop, until it terminates normally. This failure happens on reception of any unblocked signal. (It might even be caused by the SIGCHLD signal, which in the case of wait would be quite dumb, but who knows.)

1 Like

Indeed you are right! The problem is with not handling interruption of blocking calls rather than being UTop specific. I ended up wrapping calls with:

(* blocking system calls may raise EINTR *)
let rec handle_eintr sysf = try sysf () with Unix.(Unix_error (EINTR, _, _)) -> handle_eintr sysf