What happens in runtime memory when calling `Unix.fork`?

Hi.
I have a question about OCaml’s runtime memory model: what happens in runtime memory when calling Unix.fork?

I’ve read memory representation from Real World OCaml, but I still did not understand the internal about this situation well.

If a program has a lot of (maybe mutable) boxed data, e.g. a bunch of sets, maps, hash tables, etc, and it calls Unix.fork, are those data all copied to each child process? Or, are those copied when some child process tries to use them (i.e. cow; copy-on-write)? Are those deeply-copied or shallow-copied? If some child process completed their job, what does GC do?

I’m asking this question because one of my colleagues encountered a double-free exception during using Unix.fork. I didn’t ask the detailed code and he also forgot this issue and moved on, so I have no reproducible code snippet. But I’m really curious about this issue.

TL;DR, in what cases can someone encounter double-free errors when using OCaml and Unix.fork?

Unix.fork is a thin wrapper on top of fork(2); as such the whole process is duplicated at invocation time, including any runtime data structures (in particular, the heap).

Best wishes,
Nicolás

1 Like

I believe that is not quite true.

Semantically, everything is duplicated, but generally I think most OS’s actually duplicate data only when it’s changed. This is for a few reasons but generally a fork is quickly followed by an exec so it’s not a great use of resources to copy a bunch of memory that will immediately be thrown out. Similarly if the program forking uses more than half the RAM, it will not be able to execute any other programs via fork + exec if it is all copied.

1 Like

Yes, but as you point out, those are implementation details of fork(2). As far as the OCaml runtime is concerned (and the OP’s question), the whole process is duplicated, including the heap.

Best wishes,
Nicolás

2 Likes

That is true, but that is not that relevant regarding OCaml. Keep in mind that OCaml is using a garbage collector, and that operating systems tend to not immediately give the processor to the child process, but keep executing the parent process. So, the probability that a collection happens in the parent process before the child process has had time to call exec is actually non-negligible. As a consequence, the operating system will end up duplicating the whole memory, since the garbage collector will more or less write to all the memory pages during its marking phase.

(The paragraph above only applies to the fork+exec case. Not to the vfork+exec case. But I do not know if someone ever tried to use vfork on an OCaml program.)

1 Like

I seem to remember that the JaneStreet libraries use vfork. @Yaron_Minsky would know.

1 Like

They do, it’s mentioned multiple times in the documentation (I was curious myself so looked it up).

1 Like

That is not what I had in mind, as I was wondering about whether someone had provided an Ocaml interface to the vfork system call, in the spirit of Unix.fork. But it is still interesting to know that JaneStreet did use vfork in their C code. Thanks.

1 Like

I’d forgotten how complicated the rules are for using vfork(). [well, they’re simple to state, but quite restricting in practice.] The child process must immediately do an exec(), and I mean … as soon as humanly possible. And it can’t dirty memory hardly at all. It’s a pity there isn’t a halfway-between fork() and vfork(), where the parent remains suspended, but the address-space segments are cloned, so the child can do a little more before exec.

1 Like

Yes. Those objects are roughly located in two arrays, the minor heap and the major heap. When a process forks the new fork gets the copy of the parent process. Modern operating systems are performing the Copy-on-Write (CoW) optimization, so that a memory page (OS doesn’t operate on byte level, but on the page level) is copied to the space of the child process only if it accessed for modification. Since, OCaml garbage collector actually changes the objects in its heap (to trigger different colors) the copy will happen very soon. Basically, the first major GC cycle will touch and copy pages from the parent process to the child process even if neither was modifying any data.

That means, that an ideal strategy is to fork as soon as possible, and establish the communication between forks using pipes or shared memory.

1 Like

Wow. Thank all of you who gave me nice comments.

So, as far as I understand based on the comments,

  • The child process created by fork() will have the copied data from parent (heap) in its space (with CoW)
  • GC of each process will touch their own memory spaces only.
    -> So, it is hard (not possible?) to encounter a double-free error.

Then what happened to my colleague? I’m really curious.
For reference, he called Gc.compact before actually calling Unix.fork (maybe this could affect some possibility of double-free?)

I believe that running into double-free errors with (single-core) OCaml is very rare, thanks to the GC. But I could not fully understand how runtime acts when the program runs on multi-core via Unix.fork. Still, I didn’t figured out in what case double-free errors could happen with OCaml.

Did your colleague’s program have any native-code libraries linked-in? If they ever encounter this again, it would be very valuable to try to capture a test case. Even if it’s a big, complex one, it’s possible that that could be trimmed-down. But really, without a testcase, it’s hard to draw any inferences.

1 Like

However many cores or processors are involved, should be immaterial. The virtual memory of the two processes, parent and child, should be mapped to entirely different physical memory, so neither can have any effect on the other, except through shared resources like file descriptors - and including shared memory pages such as might be created by mmap() et al.

1 Like

It is difficult to see how a true double free could arise but it is much easier to see how other memory related resource errors could arise on forking a multi-threaded program in ocaml. That is because in a multi-threaded program on a POSIX platform, only async-signal-safe functions may be called between a fork and an exec. Some allocator implementations (eg glibc’s) insert atfork handlers to enable you to invoke safely malloc and free between a fork and an exec, but other allocators (eg Chromium’s tcmalloc allocator) do not and will crash.

I have a recollection of concluding 2 or 3 years ago that you cannot confidently invoke ocaml’s Unix.create_process in a multi-threaded program, because I think I found that its implementation allocated memory between the fork and the exec. Possibly that has changed since then - looking at the source of create_process should reveal. In any event, you can write your own C stub for most cases which does the equivalent of create_process, but safely, provided you only have scalar arguments or allocate your memory before you fork.

1 Like

https://github.com/ocaml/ocaml/pull/9573 should have fixed that problem by avoiding the use of fork in the Unix.create_process function if I understand correctly, ^^

1 Like

If your colleugaue would hit a double-free because of fork, then it is a bug (and a security vulnerability that will shock the whole world) in the operating system. Which is highly unlikely of course. The operating system abstraction separates processes address spaces so nothing that is done with the memory of one process can affect the other.

Also, your colleague called Gc.compact before the fork, so fork here is a typical red herring. As well as Gc.compact. Since Gc.compact does the full cycle of GC operations it commonly triggers bugs that are associated with incorrectly written C stubs. In fact, it is a common technique to call Gc.compact and see if anything is triggered. So the problem is not in fork or in Gc.compact but somewhere in FFI.

1 Like

Wow, thank you all. Now I know what happened.

It is true that the colleague used lots of FFI and file descriptors, so it is hard to tell which of them are the main cause of the double free. But it is quite sure that both “Unix.fork” and “Gc.compact” is not.

Since I’m not familiar with multi process programming so it took me a while to understand. Thanks to everyone for the detailed explanation.