Whatever happened to the idea of multi-VM OCaml runtimes?

Chet_Murthy · January 24, 2023, 6:53pm

I remember long ago, there was an idea of building a “multi-VM” version of OCaml, where you would have multiple complete OCaml abstract machines in a single address-space. So: separate heaps, separate sets of threads, etc. And then using message-passing a la Erlang for communication. And each VM would be like a current GIL OCaml VM (so pre-multicore).

I wonder what happened with that? It seemed to me like an elegant solution to incorporating SMP or NUMA parallelism, while not incurring all the complexity of true multithreaded runtimes and GC.

In short, taking a page from Erlang, and also from MPI.

Maybe it just died b/c nobody cared?

jbeckford · January 24, 2023, 9:25pm

What would be missing if you could do most of that with a custom build:

Define your own main() to initialize your message passing library
Compile your OCaml code with ocamlopt -output-complete-obj (aka. (modes (native object)) in Dune) to get an object file xyz.o containing the runtime
Use objcopy --redefine-syms on xyz.o to create xyz1.o, xyz2.o, etc. For each copy, redefine the CAML symbols (ex. _Caml_state to _Caml1_state, _caml_alloc to _caml1_alloc, etc.).
Remove duplicate symbols, if any, from xyz2.o, xyz3.o, etc. with objcopy --strip-symbols
Spawn caml_startup_1(), caml_startup_2(), etc. in threads from main()
Use normal external message_pass_put : int -> string -> unit = "my_message_passing_function", etc. FFI statements to communicate from the many OCaml runtimes to the C message passing library.
Link all the xyzN.o, your main() function, and probably a message handler to shutdown the main() function, into a single executable

Having said that, I don’t know why anyone would do that unless they were embedding OCaml inside of a bigger C program that was already using some message passing framework. I’m curious if that (or something similar) is a real use case.

Chet_Murthy · January 24, 2023, 9:34pm

So the, uh, motivation (IIRC) was to reproduce the Erlang model in OCaml. There, Erlang “processes” (== “threads”) are significantly lighter-weight than UNIX processes.

nojb · January 24, 2023, 10:42pm

Some of the changes made for multicore brough this closer, but still much work remains. See https://github.com/ocaml/ocaml/pull/8713#issuecomment-498908086 for some general commentary.

I know some people care, but I don’t think anyone is actively working/pushing for this.

Cheers,
Nicolas

nojb · January 24, 2023, 10:51pm

Yes, this is a real use case. The classical way to do parallelism in OCaml was by using multiple processes and message passing. If one could run multiple independent runtimes in a single process I think it would make this design more lightweight.

For large codebases that use a lot of global state, it can be essentially impossible to rewrite them to make use of OCaml 5-style parallelism. So the above (multiple runtimes running in a single process) would be a nice intermediate step.

Cheers,
Nicolas

UnixJunkie · January 25, 2023, 1:42am

Plus, rewriting your message-passing program to ocaml-5-style parallelism might be disappointing in terms of parallelization performance.

andreypopp · January 25, 2023, 6:44am

Thank you! I think that would be useful for developing PostgreSQL extensions with OCaml. Should try that approach, because right now it’s not possible to have several extensions loaded into PostgreSQL process at the same time.

gadmm · January 25, 2023, 4:22pm

There used to be netmulticore (which was experimental AFAIU). It used an external heap shared between several OCaml processes (so it shared some part of the address space, with something more elaborate than message-passing in mind, depending on what you count as message-passing). It comes with its own set of challenges but I have the impression that this was not perceived as giving rise to worthy questions at a scientific level.

I am broadly interested in this, for various reasons. For instance, for low-latency applications you cannot mix a low-latency domain with a high-latency domain in the OCaml stop-the-world design, so an abstraction above that of domains makes sense to me.

UnixJunkie · January 26, 2023, 2:40am

ocamlnet and its netmulticore module is something that comes to mind obviously.
I have used it (it was the backend in parany for some time), and I think it was in production at some companies where Martin Jambon worked in the past.
So, I think it was not so experimental but rather “production ready”.
I think that the design idea of having a special Gc for things which are shared between processes and having those things clearly marked was not such a bad idea.

gadmm · January 26, 2023, 2:05pm

Thanks, I did not know that netmulticore was used in production (though it did look pretty elaborate for a mere prototype). It is even worse than I thought.

Chet_Murthy · January 26, 2023, 2:07pm

IIRC part of the Erlang “special sauce” was such a GCed heap for shared immutable strings. But I could be mistaken.

c-cube · January 26, 2023, 2:33pm

As far as I know, Erlang’s VM is full of very specific design choices
that will apply to no other runtime in existence, and definitely not to
OCaml’s.

In no particular order:

the only shared values are (immutable, refcounted) large blobs
all other values are deep copied when sent from a process to another processes
(possibly living on a remote machine). The data model of Erlang means
that all values are serializable in ETF,
a bespoke but specified binary format.
processes totally own their heap (I don’t think it’s generational, but
it can be quite tiny anyway). There is basically no sharing besides
large blobs.
less relevant, but processes are fully preemptable so they’ll never
be stuck in an infinite loop.

Anyway, I don’t think looking at Erlang for inspiration for OCaml is
super useful. It’s very interesting but ML gives the user a lot more
power, including mutation.

gadmm · January 26, 2023, 3:21pm

Looking at Erlang is definitely interesting for OCaml (for instance, for its error model).

EduardoRFS · February 3, 2023, 5:34pm

I actually experimented with this, but as mentioned it is a lot of work.

It would be really useful to be defensive without spawning another process and maybe even sharing some immutable data.

Topic		Replies	Views
Ocaml-multicore: report on a June 2018 development meeting in Paris Ecosystem multicore , compiler	10	9973	August 27, 2019
Multicore OCaml: May 2020 update Community multicore , compiler , multicore-monthly	0	8797	June 1, 2020
Multicore OCaml: February 2021 Community multicore , multicore-monthly	3	11490	March 19, 2021
Multicore Update: April 2020, with a preprint paper Community multicore , compiler , multicore-monthly	27	9324	June 5, 2020
Multicore OCaml: August 2020 Community multicore , multicore-monthly	0	2814	September 14, 2020

Whatever happened to the idea of multi-VM OCaml runtimes?

Related topics