Fork fails with "Cannot allocate memory" despite lots of memory

I’m having issues with Unix.fork (). It fails reliably with

Uncaught exception:
  
  (Unix.Unix_error "Cannot allocate memory" fork "")

even though OCaml is using less than 1 GB of memory when fork is called and there is plenty of RAM and swap space available. The same program runs just fine on MacOS on my laptop.
I’m running OCaml 4.05.0 on the Ubuntu machine and OCaml 4.06.0 on the laptop.

2 Likes

The most likely causes on a server environment are:

          *  the  RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which limits
             the number of processes and threads for a real user ID, was reached;

          *  the PID limit (pids.max) imposed by the  cgroup  "process  number"  (PIDs)
             controller was reached.

ulimit -a|grep process will give you a hint about the first. The second… cat /proc/$$/cgroups maybe? If it’s another user that’s unable to fork, check those of the other user. You could also try something like perl -le 'print fork' (should print two numbers, one of them 0) to test forking independently of OCaml.

1 Like

Another reason is overcommit: if you are trying to fork too much virtual memory kernel settings will disallow fork.

I have the same problem even after raising the limits with:

ulimit -n 65535
ulimit -s 32768
ulimit -l 327680

I have 64Gb of the memory and OCaml program hardly uses only 2. I use Lwt and Lwt_preemptive. Can it be that Lwt_preemptive has some bug in this respect?

Someone who has the problem should just run ktrace and/or the debugger and figure out what’s going on.

It says the exception happening here, doesn’t look like something definitive though:

I don’t mean running the debugger at the OCaml level. Find out what was passed to fork down at the syscall level and what fork returned, especially if it returned an error code in errno.

This is why I suggested running ktrace or dtrace or what have you. It will hand you the error on a plate with an apple in its mouth.

By the way, no luck with tracer… Nothing suspicious…

What did it say that the last system call was, and what did it say the return value was?

[pid 626513] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0470b32610) = -1 ENOMEM (Cannot allocate memory)

clone isn’t the same as fork. This is probably a different issue. How many threads do you have running in this process? You may be running out of task structures.

Just 4 real threads.
And from the OCaml point of view it seems that clone is the same as fork, at least the error message is the same:

Uncaught exception:
  
  (Unix.Unix_error "Cannot allocate memory" fork "")

It might be the same message from OCaml, but it’s not the same to the kernel.

Can you produce a minimized example of this that fails on Linux (as few lines as possible to reproduce the behavior) and I can try to debug it from there on my own machine?

I was trying to find the cause or a way to reproduce it easily, but so far no luck.

We used to have a problem with Core+Lwt+Fork which as far as I remember was connected somehow with the recording of backtraces, so that at some threads backtraces were recorded, while at others they want, and this led to the heap corruption, with segfaults/ooms. If I remember it correctly, the workaround was either enable or disable backtraces explicitly using environment variables and/or the Printexc module. Hope this helps, can’t remember more, it was 10 years ago or so.