Welcome to the August 2020 Multicore OCaml report (a few weeks late due to August slowdown). This update along with the previous updates have been compiled by @shakthimaan, @kayceesrk and myself.
There are some talks related to multicore OCaml which are now freely available online:
- At the OCaml Workshop, @sadiq presented “How to parallelise your code with Multicore OCaml”
- At ICFP, @kayceesrk presented “Retrofitting Parallelism onto OCaml”, which was also awarded a Distinguished Paper award.
- At ICFP, Glenn Mével presented “Cosmo: A Concurrent Separation Logic for Multicore OCaml”.
- At the WebAssembly Community Group meeting, @kayceesrk gave a talk on Effect Handlers in Multicore OCaml. This is related to our longer term efforts to ensure that OCaml has an efficient compilation strategy to WebAssembly.
The Multicore OCaml project has had a number of optimisations and performance improvements in the month of August 2020:
- The PR on the implementation of systhreads with pthreads continues to undergo review and improvement. When merged, this opens up the possibility of installing dune and other packages with Multicore OCaml.
- Implementations of mutex and condition variables is also now under review for the
Domain
module. - Work has begun on implementing GC safe points to ensure reliable, low-latency garbage collection can occur.
We would like to particularly thank these external contributors:
- Albin Coquereau and Guillaume Bury for their comments and recommendations on building Alt-Ergo.2.3.2 with dune.2.6.0 and Multicore OCaml 4.10.0 in a sandbox environment.
-
@Leonidas for testing the code size metric implementation with
Core
andAsync
, and for code review changes.
Contributions such as the above towards adapting your projects with our benchmarking suites are always most welcome. As with previous updates, we begin with the Multicore OCaml updates, which are then followed by the enchancements and bug fixes to the Sandmark benchmarking project. The upstream OCaml ongoing and completed tasks are finally mentioned for your reference.
Multicore OCaml
Ongoing
-
ocaml-multicore/ocaml-multicore#381
Reimplementating systhreads with pthreadsThis PR has made tremendous progress with additions to domain API,
changes in interaction with the backup thread, and bug fixes. We are
now able to builddune.2.6.1
andutop
with this PR for Multicore
OCaml, and it is ready for review! -
ocaml-multicore/ocaml-multicore#384
Add a primitive to insert nop instructionThe
nop
primitive is introduced to identify the start and end of
an instruction sequence to aid in debugging low-level code. -
ocaml-multicore/ocaml-multicore#390
Initial implementation of Mutexes and Condition VariablesA draft proposal that adds support for Mutex variables and Condition
operations for the Multicore runtime.
Completed
Optimisations
-
ocaml-multicore/domainslib#16
Improvement of parallel_for implementationA divide-and-conquer scheme is introduced to distribute work in
parallel_for
, and thechunk_size
is made a parameter to improve
scaling with more than 8-16 cores. The blue line in the following
illustration shows the improvement for few benchmarks in Sandmark
using the defaultchunk_size
along with this PR:
-
ocaml-multicore/multicore-opam
Use-j%{jobs}%
for multicore variant buildsThe use of
-j%{jobs}%
in the build step for multicore variants
will speed up opam installs. -
ocaml-multicore/ocaml-multicore#374
Force major slice on minor collectionA minor collection will need to schedule a major collection, if a
blocked thread may not progress the major GC when servicing the
minor collector throughhandle_interrupt
. -
ocaml-multicore/ocaml-multicore#378
Hold onto empty pools if swept while allocatingAn optimization to improve pause times and reduce the number of
locks by using arelease_to_global_pool
flag inpool_sweep
function that continues to hold onto the empty pools. -
ocaml-multicore/ocaml-multicore#379
Interruptible mark and sweepThe mark and sweep work is now made interruptible so that domains
can enter the stop-the-world minor collections even if one domain is
performing a large task. For example, for the binary tree benchmark
with four domains, major work (pink) in domain three stalls progress
for other domains as observed in the eventlog.
With this patch, we can observe that the major work in domains two
and four make progress in the following illustration:
-
ocaml-multicore/ocaml-multicore#380
Make DLS call tocaml_domain_dls_get
@@noalloc
The
caml_dls_get
is tagged with@@noalloc
to reduce the C call
overhead. -
ocaml-multicore/ocaml-multicore#382
Optimisecaml_continuation_use_function
A couple of optimisations that yield 25% performance improvements
for the generator example by usingcaml_domain_alone
, and using
caml_gc_log
underDEBUG
mode. -
ocaml-multicore/ocaml-multicore#389
Avoid holding domain_lock when using backup threadThe wait time for the main OCaml thread is reduced by altering the
backup thread logic without holding thedomain_lock
for the
BT_IN_BLOCKING_SECTION
.
Sundries
-
ocaml-multicore/ocaml-multicore#391
UseWord_val
for pointers withPatomic_load
A bug fix to correctly handle
Patomic_load
for loaded pointers. -
ocaml-multicore/ocaml-multicore#392
Include Ipoll in leaf function testThe
Ipoll
operation is now added toasmcomp/amd64/emit.mlp
as an external call.
Benchmarking
Ongoing
-
ocaml-bench/sandmark#122
Measurements of code sizeThe code size of a benchmark is one measurement that is required for
flambda
branch. A
PR has been
created that now emits a count of the CAML symbols in the output of
a bench result as shown below:{"name":"knucleotide.", ... ,"codesize":276859.0, ...}
-
ocaml-bench/sandmark#169
Add check_url for .json and pkg-config, m4 in MakefileA
check_url
target in the Makefile has been defined to ensure that
theocaml-versions/*.json
files have a URL parameter. The patch
also addspkg-config
andm4
to Ubuntu dependencies.
Completed
Benchmarks
-
ocaml-bench/sandmark#107
Add Coq benchmarksThe
fraplib
library from the Formal Reasoning About
Programs has been dunified and
included in Sandmark for Coq benchmarks. -
ocaml-bench/sandmark#151
Evolutionary algorithm parallel benchmarkThe evolutionary algorithm parallel benchmark is now added to Sandmark.
-
ocaml-bench/sandmark#152
LU decomposition: random numbers initialisation in parallelThe random number initialisation for the LU decomposition benchmark
now has parallelism that usesDomain.DLS
andRandom.State
. -
ocaml-bench/sandmark#153
Add computationally intensive Coq benchmarksThe
BasicSyntax
andAbstractInterpretation
Coq files perform a
lot of minor GCs and allocations, and have been added as benchmarks
to Sandmark. -
ocaml-bench/sandmark#155
Sequential version of Evolutionary AlgorithmThe sequential version of algorithms are used for comparison with
their respective parallel implementations. A sequential
implementation for theEvolutionary Algorithm
has now been included
in Sandmark. -
ocaml-bench/sandmark#157
Minilight Multicore: Port to Task API and DLS for Random StatesThe Minilight benchmark has been ported to use the Task API along
with the use of Domain Local Storage for the Random States. The
speedup is shown in the following illustration: -
ocaml-bench/sandmark#164
Tweaks to multicore-numerical/game_of_lifeThe
board_size
for the Game of Life numerical benchmark is now
configurable, and can be supplied as an argument.
Bug Fixes
-
ocaml-bench/sandmark#156
Fix calculation of Nbody MulticoreMinor fixes in the calculation of interactions of the bodies in the
Nbody
implementation, and use of local ref vars to reduce writes and
cache traffic. -
ocaml-bench/sandmark#158
Fix key error for Grammatrix for Jupyter notebookThe
Key Error
issue withnotebooks/parallel/parallel.ipynb
is
now resolved by passing a value to params in the
multicore_parallel_run_config.json
file.
Sundries
-
ocaml-bench/sandmark#154
Revert PARAMWRAPPER changesUndo the
PARAMWRAPPER
configuration for parallel benchmark runs in
the Makefile, as they are not required for sequential execution. -
ocaml-bench/sandmark#160
Specify prefix,libdir for alt-ergo sandbox buildsThe
alt-ergo
library and parser require theprefix
andlibdir
to be specified withconfigure
in order to build in a sandbox
environment. The initial discussion is available at
OCamlPro/alt-ergo#351. -
ocaml-bench/sandmark#162
Avoid installing packages which are unused for Multicore runsThe
PACKAGES
variable in the Makefile has been simplified to
include only those dependency packages that are required to build
Sandmark. -
ocaml-bench/sandmark#163
Update to domainslib 0.2.2 and use default chunk_sizeThe
domainslib
dependency package has been updated to use the
0.2.2 released version, andchunk_size
for various benchmarks uses
num_tasks/num_domains
as default.
OCaml
Ongoing
-
ocaml/ocaml#9756
Garbage collectors colour changeThe PR is needed for use with the Multicore OCaml major collector by
removing the need of gray colour in the garbage collector (GC)
colour scheme.
Completed
-
ocaml/ocaml#9722
EINTR-based signals, againThe patch provides a new implementation to solve a collection of
locking, signal-handling and error checking issues.
Our thanks to all the OCaml developers and users in the community for their support and contribution to the project. Stay safe!
Acronyms
- API: Application Programming Interface
- DLS: Domain Local Storage
- GC: Garbage Collector
- OPAM: OCaml Package Manager
- LU: Lower Upper (decomposition)
- PR: Pull Request