Multicore prerequisite patches appearing in released OCaml compilers now

The release of OCaml 4.09.0 is particularly significant for us at OCaml Labs, as it represents a phase shift in our development efforts towards integrating multicore parallelism into the language. For the past few years, we have been implementing multicore as a branch based off released versions of the compiler. We finished rebasing it to OCaml 4.06.1 in April and since then have been working on upstreaming a series of incremental changes to OCaml itself.

OCaml 4.09.0 is the first such release in which multicore patches are appearing in released versions of the compiler. This is not the full multicore feature set, but rather the prerequisites to introducing changes required towards introducing parallelism into the runtime. You can now expect to see a regular set of incremental changes towards multicore in every release of OCaml as we ramp up our upstreaming efforts.

One decision we have taken recently is to spend our time on upstreaming changes in favour of further rebases to more recently versions of the compiler. If someone does have a pressing need for a rebase to OCaml 4.08 or 4.09, then please get in touch with me – but bear in mind that it’s a significant amount of work and so will need to be justified with a usecase.

In the meanwhile, here’s a summary of what some of those patches are, and what to expect in future releases:

4.09.0

  • In the upcoming multicore GC, object headers (tags and lengths) are immutable due to multiple threads scanning the heap simultaneously; any mutations could violate heap invariants in another thread and cause corruption. Therefore, Obj.truncate (#2279) and Obj.set_tag (#1725) have now been deprecated, and all uses removed from the standard library.

  • Values can be passed from OCaml to C by registering them under a known name using the Callback.register function. They can later be retrieved from C using caml_named_value, which returns a value* that can then later be dereferenced. OCaml 4.09.0 modifies the C return type to const value* to indicate that the C code cannot use the pointer that is returned to mutate the value that is registered (#2293). The ability to mutate a value using the raw pointer returned by caml_named_value is incompatible with the upcoming multicore GC, and rarely (never?) used in existing single-core OCaml code.

Ongoing for 4.10.0~dev

This is the subsequent release that is branching imminently now that OCaml 4.09.0 has been released.

  • Variables that are global in the OCaml runtime need to be duplicated per-domain in multicore, since each parallel thread of execution maintains its own table of domain local variables. OCaml 4.10.0 moves all such global C variables into a “domain state” table (#8713). While the change does not introduce any API changes, it significantly alters code generation by reserving a register that was previously used as the exception pointer in every CPU backend for quickly accessing the domain state table. It is therefore a syntactically heavy change, but shouldn’t modify the semantics of your code. If you do notice any oddnesses when testing OCaml 4.10~dev when it is released as a beta, please do report a reproduction case upstream.

  • (bonus change) While emerging from deep in a rabbit hole from fixing thread stack overflow detection and reentrant marshalling by ensuring that allocation functions do not trigger OCaml callbacks when invoked from C, it was discovered that major GC hooks could also interact with the GC heap. This is now forbidden (#8711) in OCaml 4.10.0. There was no code found in the wild that did not already conform to this restriction, and it is generally safer this way for the multicore GC as well.

Ongoing for 4.11.0~dev

As 4.10 is about to be branched, we are working away on the following next set of features to push upstream into OCaml 4.11:

  • Better safe points (#187)
  • Tracing and deprecating the instrumented runtime
  • Converging on the representation of closures in bytecode and native code.
  • Modifying GC colors to suit multicore.

As always, these chunks of ongoing work are subject to change as the technical design process is quite iterative and dependent on benchmarking results, but are hopefully useful for you to know!

53 Likes

This is exciting. Thanks for doing that work!

1 Like

I am curious, what is missing for the first multicore release? Anything that beyond mentioned changes.

You can try satisfying your curiosity by inspecting the existing 4.06.1+multicore branch and diffing it against ocaml/ocaml#trunk, and then categorising the missing patches from the results of the 3 way diff. Post it here when you generate that list :slight_smile:

2 Likes

Anil help me get hype about multicore. Is it going to have an appreciable performance hit for single-threaded only apps? This is an important sticking point for the grizzled UNIX beards who insist fork is the only true multiprogramming model.

1 Like

This was evoked at the June 2018 meeting: Ocaml-multicore: report on a June 2018 development meeting in Paris

tl;dr maybe less than 10%, but maybe comparing singlecore to multicore runtimes is not that simple

hype level: moderate

thanks!

The thread linked by @gadmm focuses on integration of the multicore runtime into the main distribution. This feature would remain dormant for a while. There is still some work needed beyond merging existing code. Here are the bits I know of, of course the multicore developers could tell you better.

  1. The biggest missing stone is a type system for effects. In the current implementation, effects are untyped (just like exceptions). There has been at least one proposal by Leo White, but as far as I know it is still under discussion. Note that effects are independent from multicore programming, but are bundled in the Multicore OCaml project as they are a tool to control the flow of execution and implement cooperative concurrency.

  2. On another front, unless I’m mistaken, only the x86-64 architecture is currently supported (see #87 and #86) and, regarding operating systems, Windows is unsupported (I don’t know whether members of the BSD family are).

  3. It’s likely some work would have to be done reviewing the standard library for thread-safety. There is also the question of what to do with the existing Thread API (systhread) (see also #240 and #239).

  4. Some more things: compaction is not implemented yet, and ephemerons are unsupported. Ephemerons under their current form add trickiness to the GC, so a direction would be to only support a less expressive flavour.

4 Likes

Thank you for such a comprehensive answer!