Multicore OCaml: September 2020
Welcome to the September 2020 Multicore OCaml report! This update along with the previous
monthly updates have been compiled by @shakthimaan, @kayceesrk and @avsm.
Big news this month is that the systhreads compatibility support PR has been merged, which means that Dune (and other users of the Thread
module) can compile out of the box. You can now compile the multicore OCaml fork conveniently using the new opam compiler
plugin (see announcement):
opam update
opam compiler create "ocaml-multicore/ocaml-multicore:no-effect-syntax"
eval $(opam env)
This selects the branch of multicore OCaml that omits the experimental effect
syntax, and thus works with the existing ppx ecosystem. It’s quite fun opam installing ecosystem packages and seeing them operate out of the box at long last. There are still a few rough edges to the thread compatibility support (mainly at the C compatibility layer, such as registering external C threads with the GC), but these will be worked out in the coming weeks. We’d like to hear of any build failures you encounter in the opam universe with this: please report them on https://github.com/ocaml-multicore/ocaml-multicore/issues
A number of performance improvements to the multicore OCaml GC and the Sandmark benchmarking project have also been completed through September:
- we have now included the Kronecker implementation from the Graph500 benchmarks to Sandmark
- an n-queen benchmark addition is in progress
- benchmark runs now provide a count of the OCaml symbols as a code size metric
- work on building Tezos with multicore OCaml, and integration with the Sandmark
benchmarking test suite has also begun.
We have also begun an effort to port Lwt to take advantage of parallelism via Lwt_preemptive
. Code samples and test runs have been performed, and Sudha has written an introductory blog post about her early results. Note that this work doesn’t change the core behaviour of Lwt (a cooperative futures framework with no context switching between bind
calls), but allows parallelism via explicit calls to background preemptive threads.
On the upstreaming efforts to OCaml, the 4.12 release will freeze earlier than usual in October, and so we finished submitting the last of the garbage collector colour changes and are aiming for the work on reliable safe points to go into OCaml 4.13. There have been a lot of runtime changes packed into 4.12 already, and so we will issue a call for testing when the release candidate of 4.12 is cut.
Onto the details of the PRs. As with the previous updates, the Multicore OCaml updates are listed first, which are then followed by the enhancements to the Sandmark benchmarking project. The upstream OCaml ongoing and completed updates are finally mentioned for your reference.
Multicore OCaml
Ongoing
-
ocaml-multicore/domainslib#17
Implement channels using Mutex and Condition VariablesThe
lib/chan.ml
sources have been updated to implement channels
using Mutex and Condition Variables, and a
LU_decomposition_multicore.exe
test has been added for the same. -
ocaml-multicore/ocaml-multicore#381
Reimplementating systhreads with pthreadsThis PR is actively being reviewed for the use of
pthreads
in
Multicore OCaml. It introduces the Domain Execution Contexts (DEC)
which allows multiple threads to run atop a domain. -
ocaml-multicore/ocaml-multicore#394
Changes to polling placementThe polls placement is done at the start of the functions and on the
back-edge of loops, instead of using Feely’s algorithm. This is a
work-in-progress. -
ocaml-multicore/ocaml-multicore#401
Do not handle interrupts recursivelyA domain local variable is introduced to prevent handling of
interrupts recursively. -
ocaml-multicore/ocaml-multicore#402
Split handle_gc_interrupt into handling remote and polling sectionsA
caml_poll_gc_work
is introduced that has information of GC work
done previously incaml_handle_gc_interrupt
. This facilitates
stw_handler
to make calls to poll and not handle service
interrupts, as it may lead to unwanted recursion. -
ocaml-multicore/ocaml-multicore#403
Segmentation fault when building Tezos on Multicore 4.10.0 with no-effects-syntaxThis is an on-going investigation on why the package
tezos-embedded-protocol-packer
in Tezos is causing a segmentation
fault when building with Multicore OCaml.
Completed
Domainslib
-
ocaml-multicore/domainslib#19
Finer grain signalling with mutex condvar for ChannelsThe use of fine grain locking for Mutex and condition variables
helps in improving the performance for larger cores, as against a
single mutex for all the signalling.
Multicore OPAM
-
ocaml-multicore/multicore-opam#31
Patch dune.2.7.1 for Multicore OCamlThe opam file for dune.2.7.1 has been added along with a patch to
bootstrap.ml
to get it working for Multicore OCaml, thanks to
Chaitanya Koparkar. -
ocaml-multicore/multicore-opam#32
Add ocamlfind-secondary dependency to duneThe installation of
dune
requiresocamlfind-secondary
as a
dependency for dune.2.7.1, and has been added to the OPAM file.
Multicore OCaml
-
ocaml-multicore/ocaml-multicore#395
Move to SPIN_WAIT for all spins and usleep in SPIN_WAITThe PR provides the SPIN_WAIT macro for all the busy spin wait
loops, and usescaml_plat_spin_wait
when busy waiting. This
ensures that the same spin strategy is used in different places in
the code. -
ocaml-multicore/ocaml-multicore#397
Relaxation of backup thread signallingThe signalling to the backup thread from the mutator thread when
leaving a blocking section is modified. It reduces the potential
Operating System scheduling when re-entering OCaml. -
ocaml-multicore/ocaml-multicore#400
Demux eventlog for backup threadThe events in the backup thread were emitting the same process ID as
the main thread, and this PR separates them.
In the above illustration, the backup threads are active when the
main thread is waiting on a condition variable.
Benchmarking
Ongoing
-
ocaml-bench/sandmark#159
Implement a better way to describe tasklet cpulistWe need a cleaner way to obtain the taskset list of cores for a
benchmark run when we are provided with a number of domains. We
should be able to specify hyper-threaded cores, NUMA zones to use,
and the specific cores to use for the parallel benchmarks. -
ocaml-bench/sandmark#173
Addition of nqueens benchmark to multicore-numericalA draft version of the classical
n queens
benchmark has been added
for review in Sandmark. This includes both the single and multicore
implementation.
Completed
-
ocaml-bench/ocaml_bench_scripts#11
Add support for configure option and OCAMLRUNPARAMThe
ocaml_bench_scripts
has been updated to support passing
configure
options and OCAMLRUNPARAM when building and running the
benchmarks in Sandmark. -
ocaml-bench/sandmark#122
Measurements of code sizeThe output .bench JSON file produced from the benchmarks now
includes a code size metric for the number of CAML symbols. A sample
benchmark output is shown below:{"name":"knucleotide.", ... ,"codesize":276859.0, ...}
The code size count for few of the benchmarks is given below:
| Benchmark | Count | |------------|-----------| | alt-ergo | 2_822_040 | | coqc | 5_869_305 | | cpdf | 1_131_376 | | nbody.exe | 276_710 | | stress.exe | 84_061 | | fft.exe | 38_914 |
-
ocaml-bench/sandmark#170
Graph500 SEQThe Graph500 benchmark with a Kronecker graph generator has now been
added to Sandmark. The generator builds three kernels for graph
construction, Breadth First Search, and Single Source Shortest
Paths. -
ocaml-bench/sandmark#172
RemoveBase
,Stdio
orun dependency for trunkThe
orun
sources in Sandmark have been updated to remove the
dependency on bothBase
andStdio
. They have been replaced with
functions fromStdlib
,List
,String
andStr
. -
ocaml-bench/sandmark#174
Cleanup our use of sudo for chrtThe use of
sudo
has been removed from the Makefile for running
parallel benchmarks, to avoid creating output files and directories
that require root permissions for access. The use of
RUN_BENCH_TARGET=run_orunchrt
will execute the benchmarks using
chrt -r 1
. The user can give permissions to thechrt
binary
using:$ sudo setcap cap_sys_nice=ep /usr/bin/chrt
OCaml
Ongoing
-
ocaml/ocaml#9876
Do not cache young_limit in a processor registerThe PR removes the caching of
young_limit
in a register for ARM64,
PowerPC and RISC-V ports, as it is problematic during polling for
signals and inter-domain communication in Multicore OCaml.
Completed
-
ocaml/ocaml#9756
Garbage collectors colour changeThe gray colour scheme in the Garbage Collector has been removed to
facilitate merging with the Multicore OCaml collector. The existing
benchmarks in Sandmark suite that did overflow the mark stack are
show in the below illustration, and there is little negative impact
on the change.
As always, we would like to thank all the OCaml developers and users in the community for their continued support and contribution to the project. Be well!
Acronyms
- ARM: Advanced RISC Machine
- BFS: Breadth First Search
- DEC: Domain Execution Context
- GC: Garbage Collector
- JSON: JavaScript Object Notation
- NUMA: Non-Uniform Memory Access
- OPAM: OCaml Package Manager
- OS: Operating System
- PR: Pull Request
- RISC-V: Reduced Instruction Set Computing - V
- SSSP: Single Source Shortest Path