Async.Writer.stderr stuck after process fork

I’m experimenting with async, core, and a pre-forking tcp server. I noticed that Log.Global.info was not logging anything in my child processes, but only if the parent process logged something before forking.

I narrowed this down a bit and produced the following program (async and core 0.14.0, OCaml 4.10, Linux) that shows that after the parent process uses Writer.stderr, newly forked child processes cannot (writes are seemingly stuck; there is no output, but the Deferred is never completed). In my example program, I expect both child1 and child2 to write two messages each: one via Debug.eprintf and one via Writer.stderr. In between forking child1 and child2, the parent forces Writer.stderr.

Am I doing something wrong? Could Writer.stderr be “reset” somehow (like the Scheduler is)?

Program output:

parent forced Writer.stderr
child1 via Debug.eprintf
child1 via Writer.stderr
exited normally: child1
child2 via Debug.eprintf

child2 never writes via Writer.stderr, and never exits (which it would do after flushing Writer.stderr).

Program:

open Core
open Async

let in_child name =
  Scheduler.reset_in_forked_process ();

  let out = force Writer.stderr in
  let msg = Format.sprintf "%s via Writer.stderr\n" name in

  begin
    Writer.write out msg;
    Writer.flushed out
    >>= fun _ ->
    exit 0
  end |> don't_wait_for;

  schedule
    (fun () -> Debug.eprintf "%s via Debug.eprintf" name);

  Scheduler.go ()

let waitpid pid name =
  Unix.waitpid pid
  >>| fun status ->
  Debug.eprintf "%s: %s" (Core.Unix.Exit_or_signal.to_string_hum status) name

let () =
  match Core.Unix.fork () with
  | `In_the_child ->
     in_child "child1" |> never_returns
  | `In_the_parent pid1 ->
     let _ = force Writer.stderr in
     Debug.eprintf "parent forced Writer.stderr";

     match Core.Unix.fork () with
     | `In_the_child ->
        in_child "child2" |> never_returns
     | `In_the_parent pid2 ->
        begin
          waitpid pid1 "child1"
          >>= fun () ->
          waitpid pid2 "child2"
        end |> don't_wait_for;

        Scheduler.go () |> never_returns

On modern UNIXes, one difference between parent and child processes, is that the child does not inherit all threads – just the thread that performed the fork(). Could that be a problem? I don’t know enough about Async to know how it deals with forking, but I recall it uses multiple POSIX threads.

Thanks @Chet_Murthy , that sounds like a likely source of problems.

Re-reading the docs for Scheduler.reset_in_forked_process, it’s clear that it’s expected the parent not actually use Async prior to forking.

If a process that has already created, but not started, the Async scheduler would like to fork …

When I have time, I’m going to investigate Thread_safe.reset_scheduler which seems like it might allow me to use Async in the parent for some initialization, stop it, fork a child, then resume Async in both parent and child.

If I can make a suggestion, it’s probably worth looking into Async’s code to see if they have forks anyplace, so you can see what they tested. If there are any tests, I’d start with those, and add little bits of Async function; if no tests, start off with the smallest bit of Async function you can test (with a fork) and then add, bit-by-bit. Eventually you should be able to figure out which particular bit is causing problems.

With a sufficiently large system, this approach can be faster than trying to understand the code well enough to figure out what went wront, ab initio.

If it is an option for you, I would look into doing fork-exec instead of forking. Forking is pretty tricky with async, as you already saw.
You can fork-exec the same executable just with different arguments and you can communicate between processes using pipes or shared memory.

Just to add, in your example, forcing stderr is what’s causing the issue since it starts the async scheduler.

1 Like

@milan Yes, that seems like the most straightforward path. Here’s a working attempt:

The supervisor creates, binds, and listens on a socket, then forks+execs itself with appropriate args for the child process to run as a worker that accepts connections. The child process inherits the file descriptor from the parent, and “imports” that into an Async.Unix.Socket.

It’s somewhat of a letdown that fork() causes so many problems, but I still have a lot to learn about Async.