Ocaml-multicore: report on a June 2018 development meeting in Paris

multicore
compiler

#1

Earlier this week (24-25 June 2018) we had a meeting in Paris between several OCaml maintainers and researchers (folks from INRIA, OCamllabs, Jane Street, and also Frédéric Bour), and one of the things that were discussed is the technical state of the Multicore-OCaml project. I thought that people here could be interested in a status update on that. Note that it’s not an update on user features (no release date for Multicore-OCaml in this post, sorry!), but an update on technical development plans, so that people that follow the compiler distribution development on https://github.com/ocaml/ocaml/ ( or https://github.com/ocamllabs/ocaml-multicore/ ) have some context for what is coming next.

I’m going to concentrate here on the part that concern the multicore runtime: garbage collection and low-level runtime code, summarizing the main points from my own notes of the meeting. Note that I’m not working on Multicore-OCaml myself, I’m reporting on work done by others. I certainly have made some mistakes below, and may update my post to fix some of those.

TL;DR: In 2019 we hope to integrate the multicore runtime into the upstream compiler distribution, even if it is not enabled by default (still experimental), so that it can evolve with the distribution and get the kind of testing and code porting experience we need to make decisions before a wider release.

Current state of the multicore runtime

KC Sivaramakrishnan ( @kayceesrk ) gave a presentation on the current state of the Multicore OCaml project. The slides are available. They are a good summary of the history and current state of the multicore runtime.

General milestone: migrate the multicore-ocaml runtime into the main distribution

Stephen Dolan ( @stedolan ) and KC reported that it is a lot of work for them to keep up with changes in the OCaml compiler implementation (they just recently rebased their branch on top of 4.06, it was 4.02 before that). When someone changes something in the OCaml compiler or runtime, they check that the rest of the distribution works properly or fix/adapt what is affected by the change; those changes are not tested against the multicore runtime, and Stephen and KC have to do all the fix/adapt work for all changes themselves.

To help with this problem, the consensus is that we should upstream the multicore-runtime code into the compiler distribution soon-ish, not enabled by default, so that people that change the compiler codebase immediately see the effect on the multicore runtime, and can participate in fixing and adapting the code with respect to their change – with guidance from multicore experts. For a period of time, the multicore runtime would only be available as an experimental option.

We hope that this migration could be done in about a year.

Q3/Q4 2018: build-up PRs

There are some PRs that change part of the current compiler and runtime to play nicer with the multicore bits, but are not part of the multicore runtime themselves. The plan is to get these submitted as PRs during the next development cycle (for 4.08, although we don’t know whether they will be merged by the 4.08 release). This is not a new process, some such PRs have already been integrated in the last few omnths/years (see for example GPR#1073, GPR#1723), but the hope is to get as much of those preliminary changes as possible.

Q3/Q4 2018 (?): forward-compatible C API

We know that Multicore OCaml will require some changes to the way C stubs are written for OCaml. Of course, we cannot expect authors of C bindings to switch overnight, so there will be some transition period where existing C bindings won’t be able to use the multicore runtime and will have to be ported. On the other hand, the current runtime should be able to support the multicore-friendly C API, so that ported code can work on both runtimes.

We agreed that it would be useful to have this “multicore-friendly C API” be available as soon as possible, or at least part of it, even before the multicore-runtime itself lands, so that people can already start making their code forward-compatible. Stephen already tried to do this in a giant PR GPR#1003 last year, and that was too big and never quite finished.

My understanding is that the multicore-OCaml people are still not completely sure on what the final API will be, which things definitely have to be broken and which thing might be supported with more work on their side, and hesitated to push changes that could end up unnecessary. We agreed to try again, with smaller API changes submitted separately of each other, instead of a giant single change; and agreed on principle that it was OK to propose new APIs that had a small chance of not being necessary in the end, as long as the API is sensible and can be easily implemented on top of the current runtime. (Early adopters might face a bit of code churn as things stabilize.)

(One first change that we want to look at is GPR#1798, which implements a notion of C-API versioning.)

2019: multicore runtime features, as an experimental runtime

Then the plan is to start merging the multicore runtime codebase itself, piece by piece, so that it becomes possible to perform larger-scale experiment with it. It still wouldn’t be enabled by default at this point, but it would be part of the actively moving compiler distribution, and in particular remain at feature parity with the rest of the compiler and runtime codebase.

In this phase we’ll need people to review the multicore runtime implementation, if only to help future upstream maintenance. We have started to ask around who would be potentially interested – hopefully with a related skill set. The project also has a laundry list of pending tasks that could also be worked on by the people studying the codebase.

Some of the tasks being worked on involve implementing parts of the OCaml runtime systems that are not yet fully supported by the multicore runtime, such as Ephemerons (a generalization of weak pointers). My understanding is that the multicore devs would like to reach feature parity before merging the runtime code, but this may be re-discussed and changed if some parts of the runtime prove too difficult to support.

remark: the runtime/language split

The current multicore-ocaml fork/switch contains both the multicore runtime, and an implementation of (untyped) effect handlers in the surface language, as the way for users to access concurrency features (to control the fibers / green-threads). Effect handlers come in evolving proposals of their own, there is a type-and-effect system under work by Leo White, and they are being discussed as well, in a somewhat independent way. Bundling the two changes in the same patchset makes reviewing more difficult, and it also created some silly technical issues: because effect handlers change the language AST, most ppx-extension code is broken on the multicore-OCaml fork, which makes it difficult to use language tooling, to test user programs, run interesting benchmarks, etc.

In the short term the plan for upstreaming the runtime is to separate it from the effect-handler part, by exposing an extremely minimal fiber-control API, as compiler primitives or as part of the Obj runtime. That is not how anybody wants end-users to access the multicore runtime, but it would be a minimal device for the first period of runtime code upstreaming and reviewing, to make it easy to compile any codebase against the multicore-aware compiler, and use the standard OCaml packages and tooling in a multicore switch.

remark: performance tuning, not yet

Right now the multicore-OCaml devs, if I understand correctly, have been mostly working with micro-benchmarks, in large part because of the difficulty of using regular OCaml packages and tooling previously mentioned. A lot of opportunities (and necessity) for performance tuning will appear once macro-benchmarks and realistic workloads become available, and once some of the larger performance-sensitive codebases (which often include some C bindings or compiler-sensitive Obj hackery) have been ported. As Anil Madhavapeddy (@avsm) pointed out, once more code out there can be benchmarked against the multicore runtime we should start continuously monitoring the performance results.

The general expectation is that the multicore runtime will be slower for purely-sequential programs than the current runtime, but the goal is to keep this overhead small (a first goal that was mentioned was a 10% overhead, although we really don’t know yet how easy/hard that target is). The two distinct runtimes may remain available in the distribution for as long as there are enough users asking for the availability sequential runtime, and that the overhead is high enough to justify the maintenance costs of keeping both. (In term of the multicore-runtime performance on sequential workloads, some things can be made faster at the cost of being harder to write and possibly more painful to maintain, so there are tradeoffs still to be explored there.)

One thing I found interesting that Stephen explained to me is: you cannot just take a sequential program (say Coq), compile and run it under a multicore switch, and expect to get a meaningful “overhead number” (as in: “the multicore runtime is X% slower than the sequential one on this program”). The problem is that GCs can be configured to have more or less memory overhead – asking for less memory overhead results in more GC work, so a slower overall program. It doesn’t make sense to only compare the default settings of two GCs for time, as they may have very different memory-overhead profile: maybe the second GC looks faster, but if you adjust its settings to use no more memory than the first it actually is slower. What you have to do instead is to try to plot the 2D time/memory compromise, and compare the graphs for the two GCs, or at least compare the entire plot of the new GC with the current results of your current GC.

Summary

  • In the next six months, we hope to start merging most of the preparatory work, and a forward-compatible C API, into the upstream compiler distribution.
  • Then in 2019 we hope to start merging the multicore runtime itself (independently of effect handlers), as a non-default experimental option. We will need people to review the codebase and feel more confident in their ability to also edit it.
  • This should allow much more extensive performance testing, and the porting of some performance-sensitive codebases, so that we can get a better picture of the performance profile, of the difficulty to port code, potential parts that need to be reworked, etc.

Plenty of interesting applications of a multicore runtime (and of a typed effect system) have also been discussed, interesting memory-model questions, formalization questions, etc. This is definitely an interesting time for the OCaml community!


#2

Dear Gabriel, thank you very much for the great summary.

Do we have any update on how the merging plan is going?

I was able to compile Coq with the multicore branch [tho lack of support for Thread is an issue for running], and indeed we are very interested to see what the roadmap is, as IMHO Coq could greatly benefit from multicore support.


#3

Things have been progressing, but slower than planned (as always). I haven’t heard anything specific/pointed back from the multicore-ocaml devs myself, but from the upstream perspective it looks like we are still in the phase “buildup work in the compiler before the core runtime changes” – planned Q3/Q4 2018. Recent changes submitted by Stephen include GPR#1725 and GPR#1917.


#4

Thanks for the update above @gasche

Are there any risks for the merge of multicore into the main OCaml trunk? I have been generally following the discussions on github and there seem to be interesting discussions on various tickets that are tagged as multicore-prerequisite (and other multicore related PRs). Seeing these discussions makes me wonder if there could be big bumps along the way in actually getting this work merged into trunk.

Or is the multicore technical approach generally accepted by the OCaml compiler maintainers and it is a matter of fleshing out the details? Is there a possibility that the maintainers might just find certain technical decisions made by the multicore team unpalatable?

OCaml with multicore would be a potentially amazing platform to build on. Haskell, though “multicore” for many years has many weaknesses of its own (laziness, over-abstraction, complexity etc). Golang OTOH does not have enough abstraction and is quite low level and imperative. The JVM has its own problems. Here OCaml is likely to hit the sweet spot.

(And yes, I’m aware of LWT in the meanwhile but I find it quite low level compared to similar stuff that can be done in Haskell).


#5

I’ll reply in the best way I can, but please keep in mind that I haven’t been involved personally in the ocaml-multicore work (although I did help review and integrate some of the multicore-prerequisite changes).

Are there any risks for the merge of multicore into the main OCaml trunk?

We don’t yet have conclusive performance numbers on the overhead of the multicore runtime for single-threaded computations. (A lot of progress has been made on a benchmarking infrastructure to measure this.) We also don’t have full visibility on the impact, ecosystem-wide, of the changes to the C FFI. If the overhead is high, or if the low-level changes incur too much breakage, this means that most people, who do not have a strong need for multicore usage in their programs, could keep using the non-multicore runtime for the time being. (Even if the runtime is merged upstream, there is a risk of it remaining a rarely-used option, or even that it would not be maintained on par with the main runtime.)

(Another risk would be that the people working on the multicore runtime today would move to something else before the merge is finished. For now it seems that they will keep getting funded at appropriate levels for this to not happen, which is very fortunate indeed. Jane Street and OCamllabs have been extremely helpful in funding the work so far, even though they may not have had much actual business-based motivation for a multicore runtime when the work stated.)

Or is the multicore technical approach generally accepted by the OCaml compiler maintainers and it is a matter of fleshing out the details?

Yes, this is my understanding. The various technical decisions have not been reviewed in depth yet, but there is agreement on the general design. (Earlier work on multicore runtimes for ocaml, such as ocaml4multicore, did not get to that point of general consensus.)

(And yes, I’m aware of LWT in the meanwhile but I find it quite low level compared to similar stuff that can be done in Haskell).

Well it may also be possible to build higher-level abstractions on top of Lwt/Async to express what it is that you want to express – for example transactional operations, if that is your thing. (But that does not replace the interest of parallel computations.)