The release of OCaml 4.09.0 is particularly significant for us at OCaml Labs, as it represents a phase shift in our development efforts towards integrating multicore parallelism into the language. For the past few years, we have been implementing multicore as a branch based off released versions of the compiler. We finished rebasing it to OCaml 4.06.1 in April and since then have been working on upstreaming a series of incremental changes to OCaml itself.
OCaml 4.09.0 is the first such release in which multicore patches are appearing in released versions of the compiler. This is not the full multicore feature set, but rather the prerequisites to introducing changes required towards introducing parallelism into the runtime. You can now expect to see a regular set of incremental changes towards multicore in every release of OCaml as we ramp up our upstreaming efforts.
One decision we have taken recently is to spend our time on upstreaming changes in favour of further rebases to more recently versions of the compiler. If someone does have a pressing need for a rebase to OCaml 4.08 or 4.09, then please get in touch with me – but bear in mind that it’s a significant amount of work and so will need to be justified with a usecase.
In the meanwhile, here’s a summary of what some of those patches are, and what to expect in future releases:
4.09.0
In the upcoming multicore GC, object headers (tags and lengths) are immutable due to multiple threads scanning the heap simultaneously; any mutations could violate heap invariants in another thread and cause corruption. Therefore, Obj.truncate (#2279) and Obj.set_tag (#1725) have now been deprecated, and all uses removed from the standard library.
Values can be passed from OCaml to C by registering them under a known name using the Callback.register function. They can later be retrieved from C using caml_named_value, which returns a value* that can then later be dereferenced. OCaml 4.09.0 modifies the C return type to const value* to indicate that the C code cannot use the pointer that is returned to mutate the value that is registered (#2293). The ability to mutate a value using the raw pointer returned by caml_named_value is incompatible with the upcoming multicore GC, and rarely (never?) used in existing single-core OCaml code.
Ongoing for 4.10.0~dev
This is the subsequent release that is branching imminently now that OCaml 4.09.0 has been released.
Variables that are global in the OCaml runtime need to be duplicated per-domain in multicore, since each parallel thread of execution maintains its own table of domain local variables. OCaml 4.10.0 moves all such global C variables into a “domain state” table (#8713). While the change does not introduce any API changes, it significantly alters code generation by reserving a register that was previously used as the exception pointer in every CPU backend for quickly accessing the domain state table. It is therefore a syntactically heavy change, but shouldn’t modify the semantics of your code. If you do notice any oddnesses when testing OCaml 4.10~dev when it is released as a beta, please do report a reproduction case upstream.
(bonus change) While emerging from deep in a rabbit hole from fixing thread stack overflow detection and reentrant marshalling by ensuring that allocation functions do not trigger OCaml callbacks when invoked from C, it was discovered that major GC hooks could also interact with the GC heap. This is now forbidden (#8711) in OCaml 4.10.0. There was no code found in the wild that did not already conform to this restriction, and it is generally safer this way for the multicore GC as well.
Ongoing for 4.11.0~dev
As 4.10 is about to be branched, we are working away on the following next set of features to push upstream into OCaml 4.11:
Converging on the representation of closures in bytecode and native code.
Modifying GC colors to suit multicore.
As always, these chunks of ongoing work are subject to change as the technical design process is quite iterative and dependent on benchmarking results, but are hopefully useful for you to know!
You can try satisfying your curiosity by inspecting the existing 4.06.1+multicore branch and diffing it against ocaml/ocaml#trunk, and then categorising the missing patches from the results of the 3 way diff. Post it here when you generate that list
Anil help me get hype about multicore. Is it going to have an appreciable performance hit for single-threaded only apps? This is an important sticking point for the grizzled UNIX beards who insist fork is the only true multiprogramming model.
The thread linked by @gadmm focuses on integration of the multicore runtime into the main distribution. This feature would remain dormant for a while. There is still some work needed beyond merging existing code. Here are the bits I know of, of course the multicore developers could tell you better.
The biggest missing stone is a type system for effects. In the current implementation, effects are untyped (just like exceptions). There has been at least one proposal by Leo White, but as far as I know it is still under discussion. Note that effects are independent from multicore programming, but are bundled in the Multicore OCaml project as they are a tool to control the flow of execution and implement cooperative concurrency.
On another front, unless I’m mistaken, only the x86-64 architecture is currently supported (see #87 and #86) and, regarding operating systems, Windows is unsupported (I don’t know whether members of the BSD family are).
Some more things: compaction is not implemented yet, and ephemerons are unsupported. Ephemerons under their current form add trickiness to the GC, so a direction would be to only support a less expressive flavour.
For back compatibility purposes, Thread is gonna still be subject to the global interlock even in the multicore runtime. You want full parallelism on multicore runtime then you need to adopt the Domain and Atomic interface, or some new thing built on top of it.
I do understand that Thread will not change, but I’m wondering if domains will provide an API that is designed to make migrating from Thread easy (ie. at least with similar functions to create domains, and with Mutex and Condition compatibility).
I’d rather not. Having implemented Thread in terms of Domain and Atomic, c.f. #240 with an option to disable the global interlock, I’ve now really soured on the API defined by the Thread library. I dislike it a lot, and wouldn’t use it in new code. I’d much prefer something that leveraged the algebraic effects feature and an effect scheduler that uses a domain pool to achieve multicore concurrency.
I mean, what about code that already uses Thread a lot? I don’t dislike the API personally, and it’s a staple of classic unix programming to use threads and locks. Effects are exciting, but being able to use normal threads domains would also be useful to some of us.
Can you elaborate more (not necessarily here) about what’s so bad with Thread?
I’m an old-timer, and I remember when it was the shiny and new thing that was supposed to be an improvement over how concurrency was done previously. I never believed it was an improvement over cooperative approaches to concurrency, because I think it’s prohibitively difficult to reason about the behavior of programs that use it.
I’m dismayed by the ongoing wave of attempts to continue sustaining the obsolete Actor Model. That mathematical model is from 1973, and it was already old when I was still too young to go on a date without adult supervision. I also don’t regard the async/await or promises model as anything fundamentally different from the Actor Model. Accordingly, I’m not very enamored with either Async or Lwt in the OCaml world.
What excites me is the possibility of using algebraic effects to implement a scheduler for something like the π-calculus— complete with a choice operator— that would allow multicore concurrency using a completely natural functional programming interface. So far, I’ve only seen that in researchy toy languages, but I think it could be made practical in multicore OCaml with algebraic effects.
As far as I understand, the effect system - whether algebraic or otherwise - does not apply to cross-domain (i.e. multicore) programming. Doing so would require some internal synchronization and communication mechanism which does not exist in the design.