Async.Writer.stderr stuck after process fork

pmahoney · February 23, 2021, 6:28am

I’m experimenting with async, core, and a pre-forking tcp server. I noticed that Log.Global.info was not logging anything in my child processes, but only if the parent process logged something before forking.

I narrowed this down a bit and produced the following program (async and core 0.14.0, OCaml 4.10, Linux) that shows that after the parent process uses Writer.stderr, newly forked child processes cannot (writes are seemingly stuck; there is no output, but the Deferred is never completed). In my example program, I expect both child1 and child2 to write two messages each: one via Debug.eprintf and one via Writer.stderr. In between forking child1 and child2, the parent forces Writer.stderr.

Am I doing something wrong? Could Writer.stderr be “reset” somehow (like the Scheduler is)?

Program output:

parent forced Writer.stderr
child1 via Debug.eprintf
child1 via Writer.stderr
exited normally: child1
child2 via Debug.eprintf

child2 never writes via Writer.stderr, and never exits (which it would do after flushing Writer.stderr).

Program:

open Core
open Async

let in_child name =
  Scheduler.reset_in_forked_process ();

  let out = force Writer.stderr in
  let msg = Format.sprintf "%s via Writer.stderr\n" name in

  begin
    Writer.write out msg;
    Writer.flushed out
    >>= fun _ ->
    exit 0
  end |> don't_wait_for;

  schedule
    (fun () -> Debug.eprintf "%s via Debug.eprintf" name);

  Scheduler.go ()

let waitpid pid name =
  Unix.waitpid pid
  >>| fun status ->
  Debug.eprintf "%s: %s" (Core.Unix.Exit_or_signal.to_string_hum status) name

let () =
  match Core.Unix.fork () with
  | `In_the_child ->
     in_child "child1" |> never_returns
  | `In_the_parent pid1 ->
     let _ = force Writer.stderr in
     Debug.eprintf "parent forced Writer.stderr";

     match Core.Unix.fork () with
     | `In_the_child ->
        in_child "child2" |> never_returns
     | `In_the_parent pid2 ->
        begin
          waitpid pid1 "child1"
          >>= fun () ->
          waitpid pid2 "child2"
        end |> don't_wait_for;

        Scheduler.go () |> never_returns

Chet_Murthy · February 23, 2021, 7:25pm

On modern UNIXes, one difference between parent and child processes, is that the child does not inherit all threads – just the thread that performed the fork(). Could that be a problem? I don’t know enough about Async to know how it deals with forking, but I recall it uses multiple POSIX threads.

pmahoney · February 24, 2021, 3:19pm

Thanks @Chet_Murthy , that sounds like a likely source of problems.

Re-reading the docs for Scheduler.reset_in_forked_process, it’s clear that it’s expected the parent not actually use Async prior to forking.

If a process that has already created, but not started, the Async scheduler would like to fork …

When I have time, I’m going to investigate Thread_safe.reset_scheduler which seems like it might allow me to use Async in the parent for some initialization, stop it, fork a child, then resume Async in both parent and child.

Chet_Murthy · February 24, 2021, 7:44pm

If I can make a suggestion, it’s probably worth looking into Async’s code to see if they have forks anyplace, so you can see what they tested. If there are any tests, I’d start with those, and add little bits of Async function; if no tests, start off with the smallest bit of Async function you can test (with a fork) and then add, bit-by-bit. Eventually you should be able to figure out which particular bit is causing problems.

With a sufficiently large system, this approach can be faster than trying to understand the code well enough to figure out what went wront, ab initio.

milan · February 24, 2021, 11:47pm

If it is an option for you, I would look into doing fork-exec instead of forking. Forking is pretty tricky with async, as you already saw.
You can fork-exec the same executable just with different arguments and you can communicate between processes using pipes or shared memory.

Just to add, in your example, forcing stderr is what’s causing the issue since it starts the async scheduler.

pmahoney · March 5, 2021, 5:31am

@milan Yes, that seems like the most straightforward path. Here’s a working attempt:

github.com

pmahoney/FrameworkBenchmarks/blob/d5ffd8f4a07470c6dc3f773172ef695358fd679c/frameworks/OCaml/httpaf/async/httpaf_async.ml

(**

A pre-forking http server using Async. Note that rather than simply
   forking child processes, the supervisor process forks then execs
   itself, with args instructing the worker process which inherited
   file descriptor to use as the listening socket.

The reason for fork+exec instead of just fork is that once Async is
   started, it's difficult (or impossible) for a forked child process
   to function correctly.

The supervisor finds its executable via Sys.argv.(0), which thus must
   be the path to the executable.

*)

open Core
open Async
open Log.Global
open Httpaf

This file has been truncated. show original

The supervisor creates, binds, and listens on a socket, then forks+execs itself with appropriate args for the child process to run as a worker that accepts connections. The child process inherits the file descriptor from the parent, and “imports” that into an Async.Unix.Socket.

It’s somewhat of a letdown that fork() causes so many problems, but I still have a lot to learn about Async.

Topic		Replies	Views
Core.Unix.create_process: How can I see the output? Learning	1	1014	March 27, 2020
Beginner async question Learning async	4	1972	December 27, 2019
Code review: Is my double `let _ =` necessary in my pipe demo? Learning async	2	1117	August 8, 2017
Lwt alternative to async.pipe Ecosystem lwt , cohttp , async	3	969	February 13, 2021
Mapping over Writer.t Learning async , core	2	1006	February 8, 2019

Async.Writer.stderr stuck after process fork

Related topics