Welcome to the February 2020 news update from the Multicore OCaml team, spread across the UK, India, France and Switzerland! This follows on from last month’s update, and has been put together by @shakthimaan and @kayceesrk.
The release of OCaml 4.10.0 has successfully pushed out some prerequisite features into the upstream compiler. Our work in February has focussed on getting the multicore OCaml branch “feature complete” with respect to the complete OCaml language, and doing extensive benchmarking and stress testing to test our two minor heap implementations.
To this end, a number of significant patches have been merged into the Multicore OCaml trees that essentially provide complete coverage of the language features. We encourage you to test the same for regressions and provide any improvements or report shortcomings to us. There are ongoing OCaml PRs and issues that are also under review, and we hope to complete those for the 4.11 release cycle. A new set of parallel benchmarks have been added to our Sandmark benchmarking suite (live instance here), including enhancements to the build setup.
The following PRs have been merged into Multicore OCaml:
Forcing_tagto fix concurrency bug with lazy values
Forcing_tagis used to implement lazy values to handle a concurrency bug. It behaves like a locked bit, and any concurrent access by a mutator will raise an exception on that domain.
A preliminary version of safe points has been merged into the Multicore OCaml trees. ocaml-multicore/ocaml-multicore#187 also contains more discussion and background about how coverage can be improved in future PRs.
Introduce an ‘opportunistic’ major collection slice
An “opportunistic work credit” is implemented in this PR which forms a basis for doing mark and sweep work while waiting to synchronise with other domains.
Do fflush and variable args in caml_gc_log
The caml_gc_log() function has been updated to ensure that
fflushis invoked only when GC logging is enabled.
During debugging with event trace data it is useful to reduce the buffer flush times, and hence the
EVENT_BUF_SIZEhas now been increased.
Write barrier optimization
This PR closes the regression for the
chameneos_redux_lwtbenchmarking in Sandmark by using
intnatto avoid sign extensions and cleans up
write_barrierto improve overall performance.
Unify sweep budget to be in word size
The PR updates the sweep work units to all be in word size. This is to handle the differences between the budget for setup, sweep and for large allocations in blocks.
- A lot of work is ongoing for the implementation of a synchronised minor garbage collector for Multicore OCaml, including benchmarking for the stop-the-world (stw) branch. We will publish the results of this in a future update, as we are assembling a currently comprehensive evaluation of the runtime against the mainstream runtime.
Sandmark now has support to run parallel benchmarks. We can also now about GC latency measurements for both stock OCaml and Multicore OCaml compiler.
More parallel benchmarks
A number of parallel benchmarks such as N-body, Quick Sort and matrix multiplication have now been added to Sandmark!
Promote packages. Unbreak CI.
The Continuous Integration build can now execute after updating and promoting packages in Sandmark.
Add support for collecting information about GC pausetimes on trunk
The PR now helps process the runtime log and produces a
.benchfile that captures the GC pause times. This works on both stock OCaml and in Multicore OCaml.
Read and write Irmin benchmark
A test for measuring Irmin’s merge capabilities with Git as its filesystem is being tested with different read and write rates.
A number of other parallel benchmarks like Merge sort, Floyd-Warshall matrix, prime number generation, parallel map, filter et. al. have been added to Sandmark.
- Examples using domainslib and modifying Domains are currently being worked upon for a chapter on Parallel Programming for Multicore OCaml. We will release an early draft to the community for your feedback.
One PR opened to OCaml this month, which fixes up the marshalling scheme to be multicore compatible. The complete set of upstream multicore prerequisites are labelled in the compiler issue tracker.
ocaml/ocaml#9293 Use addrmap hash table for marshaling
The hash table (addrmap) implementation from Multicore OCaml has been ported to upstream OCaml to avoid using GC mark bits to represent visitedness.
- CTF: Common Trace Format
- CI: Continuous Integration
- GC: Garbage Collector
- PR: Pull Request
As always, many thanks to our fellow OCaml developers and users who have reviewed our code, reported bugs or otherwise assisted this month.