Welcome to the March 2021 Multicore OCaml monthly report! The following update and the previous ones have been compiled by me, @kayceesrk and @shakthimaan. We remain broadly on track to integrate the last of the multicore prerequisites into the next (4.13) release, and to propose domains-only parallelism for OCaml 5.0.
Upstream OCaml 4.13 development
The complex safe points PR (#10039) is continuing to make progress, with more refinement towards reducing the binary size increase that results from the introduction of more polling points. Special thanks to @damiendoligez for leaping in with a PR-to-the-PR to home in on a workable algorithm!
Multicore OCaml trees
If there’s one thing we’re not going to miss, it’s git rebasing. The multicore journey began many moons ago with OCaml 4.02, and then 4.04, 4.06, and the current 4.10. We’re pleased to announce the hopefully-last rebase of the multicore OCaml trees to OCaml 4.12.0 are now available. There is now a simpler naming scheme as well to reflect our upstreaming strategy more closely:
- OCaml 4.12.0+domains is the domains-only parallelism that will be submitted for OCaml 5.0
- OCaml 4.12.0+domains+effects is the version with domains parallelism and effects-based concurrency.
You can find opam installation instructions for these over at the multicore-opam repository. There is even an ocaml-lsp-server available, so that your favourite IDE should just work!
Domains-only parallelism trees
The bulk of effort this month has been around the integration and debugging of Domain Local Allocation Buffers (DLABs), and also chasing down corner-case failures from stress testing and opam bulk builds. For details, see the long list of PRs in the next section.
We’re also cleaning up historical vestiges in order to reduce the diff to OCaml trunk, in order to clear the path to a clean diff for generating OCaml 5.0 PRs for upstream integration.
Concurrency and Effects trees
The camera-ready paper for PLDI 2021 on Retrofitting Effect handlers onto OCaml is now available on arXiv. The code described in the paper can be used via the 4.12.0+domains+effects
opam switches. Please feel free to keep any comments coming to @kayceesrk and myself.
We’ve also been hacking on the multicore IO stack and just beginning to combine concurrency (via effects) and parallelism (via domains) into Linux io_uring, macOS’ Grand Central Dispatch and Windows iocp. We’ll have more to report on this over the next few months, but early benchmarking numbers on Linux are promising.
CI and Benchmarking
We are continuing to expand the testing for different CI configurations for the project. With respect to Sandmark benchmarking, we are in the process of adding the Irmin layers.ml benchmark. There is also an end-to-end pipeline of using the OCurrent current-bench framework to give us benchmarking results from PRs that can be compared to previous runs.
As always, we begin with the Multicore OCaml updates, which are then followed by the ongoing and completed tasks for the Sandmark benchmarking project. Finally, the upstream OCaml work is listed for your reference.
Detailed Updates
Multicore OCaml
Ongoing
DLAB
-
ocaml-multicore/ocaml-multicore#484
Thread allocation buffersThe PR provides an implementation for thread local allocation
buffers orDomain Local Allocation Buffers
. Code review and
testing of the changes is in progress. -
ocaml-multicore/ocaml-multicore#508
Domain Local Allocation BuffersThis is an extension to the
Thread allocation buffers
PR with
initialization, lazy resizing of the global minor heap size, and
rebase to 4.12 branch.
Testing
-
ocaml-multicore/ocaml-multicore#522
Building the runtime with -O0 rather than -O2 causes testsuite to failThe runtime tests fail when using
-O0
instead of-O2
and this
needs to be investigated further. -
ocaml-multicore/ocaml-multicore#526
weak-ephe-final issue468 can fail with really small minor heapsThe
weak-ephe-final
tests with a small minor heap (4096 words) cause
the issue468 test to fail. -
ocaml-multicore/ocaml-multicore#528
Expand CI runsA list of requirements to expand the scope and execution of our
existing CI runs for comprehensive testing.
Sundries
-
ocaml-multicore/ocaml-multicore#514
Update instructions in ocaml-variants.opamThe
ocaml-variants.opam
andconfigure.ac
files have been updated
to use the Multicore OCaml repository, and to use a local switch
instead of a global one. The current Multicore OCaml is at the 4.12
branch. -
ocaml-multicore/ocaml-multicore#523
Systhreads Mutex raises Sys_errorThe error checking for Systhreads Mutex should be inline with trunk,
instead of the fatal errors reported by Multicore OCaml. -
ocaml-multicore/ocaml-multicore#527
Port eventlog to CTFThe
eventlog
implementation has to be ported to the Common Trace
Format. The log output should be consistent with the
parallel_minor_gc output, and stress testing need to be performed.
Completed
Upstream
-
ocaml-multicore/ocaml-multicore#490
Remove getmutablefield from bytecodeThe bytecode compiler and interpreter have been updated by removing
thegetmutablefield
opcodes. -
ocaml-multicore/ocaml-multicore#496
Replace caml_initialize_field with caml_initializeA patch to replace
caml_initialize_field
, which was earlier used
with the concurrent minor collector, is now replaced with
caml_initialize
. -
ocaml-multicore/ocaml-multicore#503
Re-enable lib-obj and asmcomp/is_static testsThe
lib-obj
andasmcomp/is_static
tests have been re-enabled and
the configure settings have been updated for Multicore
NO_NAKED_POINTERS. -
ocaml-multicore/ocaml-multicore#506
ReplaceOp_val
withField
The use of
Op_val (x)[i]
has been replaced withField (x, i)
to
be consistent with trunk implementation. -
ocaml-multicore/ocaml-multicore#507
Change interpreter to use naked code pointersThe changes have been made to identify naked pointers in the
interpreter stack to be compatible with trunk. -
ocaml-multicore/ocaml-multicore#516
Remove caml_root APIThe
caml_root
variables have been changed tovalue
type and are
managed as generational global roots. Hence, thecaml_root
API is
now removed.
DLAB
-
ocaml-multicore/ocaml-multicore#511
Allocate unique root token on the major heap instead of the minorThe unique root token allocation is now done on the major heap
allocation that does not raise any exception, and exits cleanly when
a domain creation fails. -
ocaml-multicore/ocaml-multicore#513
Clear the minor heap at the end of a collection in debug runtimeA debug value is written to every element of the minor heap for
debugging failures. We now clear the minor heap at the end of a
minor collection. -
ocaml-multicore/ocaml-multicore#519
Make timing test more robustThe
timing.ml
test has been updated to be more resilient for
testing with DLABs.
Enhancements
-
ocaml-multicore/ocaml-multicore#477
Move TLS areas to a dedicated memory spaceIn order to support Domain Local Allocation Buffer, we now move the
TLS areas to its own memory alloted space thereby changing the way
we allocate an individual domain’s TLS. -
ocaml-multicore/ocaml-multicore#480
Remove leave_when_done and friends from STW APIThe barriers from
caml_try_run_on_all_domains*
andstw_request
are removed by cleaning up thestw_request.leave_when_done
implementation. -
ocaml-multicore/ocaml-multicore#481
Don’t share array amongst domains in gc-roots testsEvery domain should have its own array, and the parallel global
roots tests have been updated with this change. -
ocaml-multicore/ocaml-multicore#494
Stronger invariants on unix_forkWe now enforce stronger invariants such that no other domain can run
alongside domain 0 (caml_domain_alone
) forunix_fork
. -
ocaml-multicore/ocaml-multicore#515
Add memprof stubs to build and stdlibThe required
memprof
functions have been added to buildstdlib
,
and also to build memprof for the runtime.
Lazy Updates
-
ocaml-multicore/ocaml-multicore#501
Safepoints lazy fixThe lazy implementation need to be aware of safe points, and we need
to differentiate between recursive forcing of lazy values from
parallel forcing. These fixes are from
ocaml-multicore#492
and
ocaml-multicore#493. -
ocaml-multicore/ocaml-multicore#505
Add a unique domain token to distinguish lazy forcing failureA
caml_ml_domain_unique_token
has been added to handle racy access
by multiple mutators. This fixes the using domain id
(int)
to identify forcing domain of lazy block issue.
Fixes
-
ocaml-multicore/ocaml-multicore#487
systhreads: set gc_regs_buckets and friends to NULL at thread startupPointers have been initialized to NULL in
systhreads/st_stubs.c
which solves the segmentation
fault
observed when running the Layers benchmark. -
ocaml-multicore/ocaml-multicore#491
Reinitialize child locks after forkThe runtime needs to operate correctly after a
fork
, and this
patch fixes it with proper resetting of domain lock. -
ocaml-multicore/ocaml-multicore#495
Fix problems with finaliser orphaningA fix for how we merge finalization tables for orphaned finaliser
work. A test case has also been added to the PR. -
ocaml-multicore/ocaml-multicore#499
Fix backtrace unwindThe unwinding of stacks over callbacks was not happening correctly
and the discrepancy incaml_next_frame_descriptior
is now resolved. -
ocaml-multicore/ocaml-multicore#509
Fix for bad setup of Continuation_already_taken exception in bytecodeA patch to fix the
Continuation_already_taken
exception which was
not set up as needed in the bytecode execution. -
ocaml-multicore/ocaml-multicore#510
Update a testcase in principality-and-gadts.mlA change in
principality-and-gadts.ml
to log the correct output as
compared to 4.12 branch in ocaml/ocaml.
Ecosystem
-
ocaml-multicore/multicore-opam#46
Multicore compatible ocaml-migrate-parsetree.2.1.0The
ocaml-migrate-parsetree
package uses the effect syntax and now
builds with Multicore OCamlparallel_minor_gc
branch. -
ocaml-multicore/multicore-opam#47
Multicore compatible ppxlibThe effect syntax has been added to
ppxlib
and is also now
compatible with Multicore OCaml. -
ocaml-multicore/multicore-opam#49
4.12 Multicore configsAdded configurations to install
4.12.0+domains+effects
and
4.12.0+domains
OCaml variants. -
ocaml-multicore/ocaml-multicore#473
Building on musl requires dynamically linked execinfoThe opam files to allow installation on musl-based environments for
Multicore OCaml have been added to the repository. -
ocaml-multicore/ocaml-multicore#482
Check for -lexecinfo in order to build on musl/alpineA
configure
script has been added which checks for-lexecinfo
in
order to support building Multicore OCaml on musl/alpine.
Documentation
-
ocaml-multicore/ocaml-multicore#502
Update README to introduce 4.12+domains+effects and 4.12+domainsWe have updated the README file with the current list of active
branches, and the names of the historic variants. -
ocaml-multicore/ocaml-multicore#520
Clarify comment on RacyLazyA documentation update in
stdlib/lazy.mli
that clarifies when
RacyLazy
andUndefined
are raised.
Sundries
-
ocaml-multicore/ocaml-multicore#486
Sync no-effects-syntax to parallel_minor_gc branchThe
ocaml-multicore:no-effects-syntax
branch is now up to date
with theparallel_minor_gc
branch changes. -
ocaml-multicore/ocaml-multicore#489
Remove promote_toThe
promote_to
function was used in the concurrent minor GC. It is
not required any more and hence has been removed. -
ocaml-multicore/ocaml-multicore#500
Replace caml_modify_field with caml_modifyThe
caml_modify_field
is no longer necessary and has been replaced
withcaml_modify
.
Benchmarking
Ongoing
-
ocaml-bench/sandmark#204
Adding layers.ml as a benchmark to SandmarkThe inclusion of Irmin layers.ml benchmark with updates to all its
dependency requirements. -
ocaml-bench/sandmark#209
Use rule target kronecker.txt and remove from macro_benchA review of the graph500seq
kernel1.ml
implementation has been
done, and code changes have been proposed. Themacro_bench
tag
will be retained for thegraph500
benchmarks. -
ocaml-bench/sandmark#212
Increasing the major heap allocation on some benchmarksA work in progress to add more longer running benchmarks that
involve major heap allocation. Some of the parameters have been
updated with higher values, and more loops have been added as well. -
We now have integrated the build of Sandmark 2.0 with
current-bench for
CI. The results of the benchmark runs are now pushed to a PostgreSQL
database as shown below:docker=# select * from benchmarks; -[ RECORD 1 ]--+------------------------------------------------------- run_at | 2021-03-26 11:21:20.64 repo_id | local/local commit | 55c6fb6416548737b715d6d8fde6c0f690526e42 branch | 2.0.0-alpha+001 pull_number | benchmark_name | test_name | coq.BasicSyntax.v metrics | {"maxrss_kB": 678096, "time_secs": 101.99969387054443} duration | 00:37:52.776357 -[ RECORD 2 ]--+------------------------------------------------------- run_at | 2021-03-26 11:21:20.64 repo_id | local/local commit | 55c6fb6416548737b715d6d8fde6c0f690526e42 branch | 2.0.0-alpha+001 pull_number | benchmark_name | test_name | thread_ring_lwt_mvar.20_000 metrics | {"maxrss_kB": 8096, "time_secs": 2.6146790981292725} duration | 00:37:52.776357 ...
We will continue to work on adding more workflows and features to
current-bench
to support Sandmark builds.
Completed
-
ocaml-bench/sandmark#202
Added bench clean target in the MakefileA
benchclean
target to remove the generated benchmarks and its
results while still retaining the_opam
folder has been added to
the Makefile. -
ocaml-bench/sandmark#203
Implement ITER supportThe use of ITER variable is now supported in Sandmark, and you can
run multiple iterations of the benchmarks. For example, with
ITER=2
, a couple of summary .bench files are created with the
benchmark results as shown below:$ TAG='"run_in_ci"' make run_config_filtered.json $ ITER=2 RUN_CONFIG_JSON=run_config_filtered.json make ocaml-versions/4.10.0+multicore.bench $ ls _results/ 4.10.0+multicore_1.orun.summary.bench 4.10.0+multicore_2.orun.summary.bench
-
ocaml-bench/sandmark#208
Fix params for simple-tests/capiA minor fix in
run_config.json
to correctly pass the arguments to
thesimple-tests/capi
benchmark execution. You can verify the same
using the following commands:$ TAG='"lt_1s"' make run_config_filtered.json $ RUN_CONFIG_JSON=run_config_filtered.json make ocaml-versions/4.10.0+multicore.bench
-
ocaml-bench/sandmark#210
Don’t share array in global roots parallel benchmarksA patch to not share array in global roots implementation for
parallel benchmarks. -
ocaml-bench/sandmark#213
Resolve dependencies for 4.12.1+trunk, 4.12.0+domains and 4.12.0+domains+effectsThe
dependencies/packages
have now been updated to be able to
build4.12.1+trunk
,4.12.0+domains
and4.12.0+domains+effects
branches with Sandmark.
OCaml
Ongoing
-
ocaml/ocaml#10039
SafepointsThe review of the Safepoints PR is in progress. Special thanks to
Damien Doligez for his code
suggestions on safepoints
and inserting polls. There is still work to be done on
optimizations.
Many thanks to all the OCaml users, developers and contributors in the
community for their support to the project. Stay safe!
Acronyms
- API: Application Programming Interface
- CI: Continuous Integration
- CTF: Common Trace Format
- DLAB: Domain Local Allocation Buffer
- GC: Garbage Collector
- OPAM: OCaml Package Manager
- PR: Pull Request
- STW: Stop The World
- TLS: Thread Local Storage