Welcome to the June 2021 Multicore OCaml monthly report! This month’s update along with the previous update’s have been compiled by @avsm, @ctk21, @kayceesrk and @shakthimaan.
Our overall goal remains on track for generating a preview tree for OCaml 5.0 multicore domains-only parallelism over the summer.
Ecosystem compatibility for 4.12.0+domains
In May’s update, I noted that our focus was now on adapting the ecosystem to work well with multicore, and I’m pleased to report that this is progressing very well.
-
The 4.12.0+domains multicore compiler variant has been merged into mainline opam-repo, so you can now
opam switch 4.12.0+domains
directly. Thebase-domains
package is also available to mark your opam project as requiring theDomains
module, so you can even publish your early multicore-capable libraries to the mainline opam repository now. -
The OCaml standard library was made safe for parallel use by multiple domains (wiki, issue, fixes); and in particularly the
Format
andRandom
modules. These modules were the main sources of incompatibilities we found when running existing OCaml code with multiple domains.
-
The
Domain
module has had its interface slimmed with the removal ofcritical_section
,wait
,notify
which has allowed significant runtime simplification. The GC C-API interface is now implemented and this means that Jane Street’sBase
,Core
, andAsync
now compile on4.12+domains
without modifications; for exampleopam install patdiff
works out of the box on a4.12+domains
switch! -
Domainslib 0.3.0 has been released which incorporates multiple improvements including the work-stealing deques for task distribution. The performance of reading domain local variables has also been improved with a primitive and a O(1) lookup. The chapter on
Parallel Programming in Multicore OCaml
has been updated to reflect the latest developments with Domainslib.
This means that big application stacks should now compile pretty well with 4.12.0+domains (applications like the Tezos node and patdiff exercise a lot of the dependency trees in opam). If you do find incompatibilities, please do report them on the repository.
4.12.0+domains+effects
Most of our focus has been on getting the domains-only trees (for OCaml 5.0) up to speed, but we have been progressing the direct-style effects-based IO stack as well.
- The
uring
bindings to Linux Io_uring are now available on opam-repository, so you can try it out on sequential OCaml too. A good mini-project would be to add a uring backend to the existing Async or Lwt engines, if anyone wants to try a substantial contribution. - The
eio
library is fairly usable now, for both filesystem and networking. We’ve submitted a talk to the OCaml workshop to dive into the innards of it in more detail, so stay tuned for that in the coming months if accepted. The main changes here have been performance improvements, and the HTTP stack is fairy competitive with (e.g.)rust-hyper
.
We will soon also have a variant of this tree that removes the custom effect syntax and implements the fibres (the runtime piece) as Obj
functions. This will further improve ecosystem compatibility and allow us to build direct-style OCaml libraries that use fibres internally to provide concurrency, but without exposing any use of effects in their interfaces.
Benchmarking and performance
We are always keen to get more benchmarks that exercise multicore features; if you want to try multicore out and help write benchmarks there are some suggestions on the wiki. We’ve got a private server which runs a Sandmark nightly benchmark pipeline with Jupyter notebooks, which we can give access to anyone who submits benchmarks. We continue to test integration of Sandmark with current-bench for better integration with GitHub PRs.
As always, the Multicore OCaml ongoing and completed tasks are listed first, which are then followed by updates from the ecosystem and their associated libraries. The Sandmark benchmarking and nightly build efforts are then mentioned. Finally, the status of the upstream OCaml Safepoints PR is provided for your reference.
Multicore OCaml
Ongoing
-
ocaml-multicore/ocaml-multicore#573
Backport trunk safepoints PR to multicoreA work-in-progress to backport the Safepoints PR from ocaml/ocaml to
Multicore OCaml. -
ocaml-multicore/ocaml-multicore#584
Modernise signal handlingA patch to bring the Multicore OCaml signals implementation closer
to upstream OCaml. -
ocaml-multicore/ocaml-multicore#598
Do not deliver signals to threads that have blocked themA draft PR to not deliver signals to threads that are in a blocked
state. The without-systhreads case needs to be handled. -
ocaml-multicore/ocaml-multicore#600
Expose a few more GC variables in headersThe
caml_young_start
,caml_young_limit
andcaml_minor_heap_wsz
variables have been defined in the runtime. -
ocaml-multicore/ocaml-multicore#601
Domain better participantsThe iterations
0(Max_domains)
from STW signalling and
0(n_running_domains)
from domain creation have now been removed. -
ocaml-multicore/ocaml-multicore#603
Systhreads tick threadAn initial draft PR for porting the tick thread to Multicore OCaml.
Completed
Enhancements
-
ocaml-multicore/ocaml-multicore#552
Add aforce_instrumented_runtime
option to configureThe
configure
script now accepts a new
--enable-force-instrumented-runtime
option to facilitate use of
the instrumented runtime on linker invocations to obtain event logs. -
ocaml-multicore/ocaml-multicore#558
RefactorDomain.{spawn/join}
to use no critical sectionsThe critical sections in
Domain.{spawn/join}
and the use of
Domain.wait
have been removed. -
ocaml-multicore/ocaml-multicore#561
Slim downDomain.Sync
: removewait
,notify
,critical_section
A breaking change in
Domain.Sync
that removescritical_section
,
notify
,wait
,wait_for
, andwait_until
. This is to remove
the need for domain-to-domain messaging in the runtime. -
ocaml-multicore/ocaml-multicore#576
Including Git hash in runtimeA Git hash is now printed in the runtime as shown below:
$ ./boot/ocamlrun -version The OCaml runtime, version 4.12.0+multicore Built with git hash 'ae3fb4bb6' on branch 'runtime_version' with tag '<tag unavailable>'
-
ocaml-multicore/ocaml-multicore#579
Primitive for fetching DLS rootA new primitive has been implemented for fetching DLS, and is now a
singlemov
instruction onamd64
.
Upstream
-
ocaml-multicore/ocaml-multicore#555
runtime:CAML_TRACE_VERSION
is now set to a Multicore specific valueA
CAML_TRACE_VERSION
is defined to distinguish between Multicore
OCaml and trunk for the runtime. -
ocaml-multicore/ocaml-multicore#581
Move our usage of inline toCaml_inline
We now use
Caml_inline
for all the C inlining in the runtime to
align with upstream OCaml. -
ocaml-multicore/ocaml-multicore#589
Reintroduceadjust_gc_speed
The
caml_adjust_gc_speed
function from trunk has been reintroduced
to the Multicore OCaml runtime. -
ocaml-multicore/ocaml-multicore#590
runtime: stubcaml_stat_*
interfaces in gc_ctrlThe creation of
caml_stat_*
stub functions in gc_ctrl.h to
introduce a compatibility layer for GC stat utilities that are
available in trunk.
Fixes
-
ocaml-multicore/ocaml-multicore#562
Import fixes to the minor heap allocation code from DLABsThe multiplication factor of two used for minor heap allocation has
been removed, and theMinor_heap_max
limit from config.h is no
longer converted to a byte size for Multicore OCaml. -
ocaml-multicore/ocaml-multicore#593
Fix two issues with ephemeronsA patch to simplify ephemeron handover during termination.
-
ocaml-multicore/ocaml-multicore#594
Fix finaliser handover issueThe
caml_finish_major_cycle
is used leading to the major GC phase
Phase_sweep_and_mark_main
for the correct handoff of finalisers. -
ocaml-multicore/ocaml-multicore#596
systhreads: dost_thread_id
after initializing the thread descriptorThe thread ID was set even before initializing the thread
descriptor, and this PR fixes the order. -
ocaml-multicore/ocaml-multicore#604
Fix unguardedcaml_skiplist_empty
incaml_scan_global_young_roots
The PR introduces a
caml_iterate_global_roots
function and fixes a
locking bug with global roots.
Cleanups
-
ocaml-multicore/ocaml-multicore#567
Simplify some of the minor_gc codeThe
not_alone
variable has been cleaned up with a simplification
to the minor_gc.c code. -
ocaml-multicore/ocaml-multicore#580
Remove struct domainThe
caml_domain_state
is now the single source of domain
information with the removal ofstruct domain
.struct dom_internal
is no longer leaking across the runtime. -
ocaml-multicore/ocaml-multicore#583
Removing interrupt queuesThe locking of
struct_interruptor
when receiving interrupts and
the use ofstruct interrupt
have been removed, simplifying the
implementation of domains.
Sundries
-
ocaml-multicore/ocaml-multicore#582
Make global state domain-local in Random, Hashtbl and FilenameThe Domain-Local is now set as the default state in
Random
,
Hashtbl
andFilename
. -
ocaml-multicore/ocaml-multicore#586
Make the state in Format domain-localThe default state in
Format
is now set to Domain-Local. -
ocaml-multicore/ocaml-multicore#595
Implementcaml_alloc_dependent_memory
andcaml_free_dependent_memory
Dependent memory are the blocks of heap memory that depend on the GC
(and finalizers) for deallocation. Thecaml_alloc_dependent_memory
andcaml_free_dependent_memory
have been added to
runtime/memory.c.
Ecosystem
Ongoing
-
ocaml-multicore/eventlog-tools#3
Use ocaml/setup-ocaml@v2An update to
.github/workflows/main.yml
to build for
ocaml/setup-ocaml@v2. -
ocaml-multicore/parallel-programming-in-multicore-ocaml#7
Add a section on Domain-Local StorageThe README.md file now includes a section on Domain-Local Storage.
-
ocaml-multicore/eio#26
Grand Central Dispatch BackendThe implemention of the Grand Central Dispatch (GCD) backend for Eio
is a work-in-progress. -
ocaml-multicore/domainslib#34
Fix initial value accounting inparallel_for_reduce
A patch to fix the initial value in
parallel_for_reduce
as it was
being accounted for multiple times. -
ocaml-multicore/domainslib#36
Switch to defaultRandom
moduleThe library has been updated to use the default
Random
module as
it stores its state in Domain-Local Storage which can be called from
multiple domains. The Sandmark results are given below:
-
ocaml-multicore/multicore-opam#56
Base-effects depends strictly on 4.12A query on the use of strict 4.12.0 lower bound for OCaml in
base-effects.base/opam
. -
ocsigen/lwt#860
Lwt_domain: An interfacet to Multicore parallelismThe
Lwt_domain
module has been ported to domainslib Task pool for
performing computations to CPU cores using Multicore OCaml’s
Domains. A few benchmark results obtained on an Intel Xeon Gold 5120
processor with 24 isolated cores is shown below:
Completed
Ocaml-Uring
The ocaml-uring
repository contains bindings to io_uring
for
OCaml.
-
ocaml-multicore/ocaml-uring#21
Add accept callThe
accept
call has been added to uring along with the inclusion
of theunix
library as a dependency. -
ocaml-multicore/ocaml-uring#22
Add support for cancellationA
cancel
method is added to request jobs for cancellation. The
queuing operations and tests have also been updated. -
ocaml-multicore/ocaml-uring#24
Sort out castThe
Int_val
has been changed toLong_val
to remove the need for
sign extension instruction on 64-bit platforms. -
ocaml-multicore/ocaml-uring#25
Fix test_cancelA
with_uring
function is added with aqueue_depth
argument to
handle tests for cancellation. -
ocaml-multicore/ocaml-uring#26
Addopenat2
The
openat2
method has been added giving access to all the Linux
open and resolve flags. -
ocaml-multicore/ocaml-uring#27
Fine-tune C flags for better performanceThe CFLAGS have been updated for performance improvements. The
following results are observed for the noop benchmark:Before: noop 10000 │ 1174227.1170 ns/run│ After: noop 10000 │ 920622.5802 ns/run│
-
ocaml-multicore/ocaml-uring#28
Don’t allow freeing the ring while it is in useThe ring is added to a global set on creation and is cleaned up on
exit. Also, invalid cancellation requests are checked before
allocating a slot. -
ocaml-multicore/ocaml-uring#29
Replace iovec with cstruct and clean up the C stubsThe
readv
andwritev
now accept a list of Cstructs which allow
access to sub-ranges of bigarrays, and to work with multiple
buffers. The handling of OOM errors has also been improved. -
ocaml-multicore/ocaml-uring#30
Fix remaining TODOs in APIThe
read
andwrite
methods have been renamed toread_fixed
and
write_fixed
respectively. TheRegion.to_cstruct
has been added
as an alternative to creating a sub-bigarray. An exception is now
raised if the user requests for a larger size chunk. -
ocaml-multicore/ocaml-uring#31
Usecaml_enter_blocking_section
when waitingThe
caml_enter_blocking_section
andcaml_leave_blocking_section
are used when waiting, which allows other threads to execute and the
GC can run in the case of Multicore OCaml. -
ocaml-multicore/ocaml-uring#32
Compileuring
using the C flags from OCamlUse the OCaml C flags when building uring, and remove the unused
dune file. -
ocaml-multicore/ocaml-uring#33
Prepare releaseThe CHANGES.md, README.md, dune-project and uring.opam files have
been updated to prepare for a release. -
ocaml-multicore/ocaml-uring#34
Convertliburing
to subtreeWe now use a subtree instead of a submodule so that the ocaml-uring
can be submitted to the opam-repository.
Parallel Programming in Multicore OCaml
-
ocaml-multicore/parallel-programming-in-multicore-ocaml#5
num_domains
tonum_additional_domains
The documentation and code examples have been updated to now use
num_additional_domains
instead ofnum_domains
. -
ocaml-multicore/parallel-programming-in-multicore-ocaml#6
Update latest information about compiler versionsThe compiler versions in the README.md have been updated to use 4.12
and its variants. -
ocaml-multicore/parallel-programming-in-multicore-ocaml#8
Nudge people to the default chunk_size settingThe recommendation is to use the default
chunk_size
when using
parallel_for
, especially when the number of domains gets larger. -
ocaml-multicore/parallel-programming-in-multicore-ocaml#9
Eventlog section updatesThe
eventlog-tools
library can now be used for parsing trace files
since Multicore OCaml includes CTF tracing support from trunk. The
relevant information has been updated in the README.md file.
Eio
The eio
library provides an effects-based parallel IO stack for
Multicore OCaml.
Additions
-
ocaml-multicore/eio#41
Add eio.mli fileA
lib_eio/eio.mli
file containing modules forGeneric
,Flow
,
Network
, andStdenv
have been added to the repository. -
ocaml-multicore/eio#45
Add basic domain managerThe PR allows you to run a CPU-intensive task on another domain, and
adds a mutex totraceln
to avoid overlapping output. -
ocaml-multicore/eio#46
Add Eio.Time and allow cancelling sleepsUse
psq
instead ofbheap
library to allow cancellations. The
Eio.Time
module has been added tolib_eio/eio.ml
. -
ocaml-multicore/eio#53
AddSwitch.sub_opt
A new
Switch.sub_opt
implementation has been added to allow
running a function with a new switch. Also,Switch.sub
has been
modified so that it is not a named argument. -
ocaml-multicore/eio#54
Initial FS abstractionA module
Dir
has been added to allow file system abstraction along
with the ability to create files and directories. On Linux, it uses
openat2
andRESOLVE_BENEATH
. -
ocaml-multicore/eio#56
Addwith_open_in
,with_open_out
andwith_open_dir
helpersThe
Eio.Dir
module now contains awith_open_in
,with_open_out
andwith_open_dir
helper functions. -
ocaml-multicore/eio#58
AddEio_linux.{readv, writev}
The
Eio_linux.{readv, writev}
functions have been added to
lib_eio_linux/eio_linux.ml
which uses the new OCaml-Uring API. -
ocaml-multicore/eio#59
AddEio_linux.noop
and a simple benchmarkA
Eio_linux.noop
implementation has been added for benchmarking
Uring dispatch. -
ocaml-multicore/eio#61
Add generic Enter effect to simplify schedulerA
Enter
effect has been introduced to simplify the scheduler
operations, and this does not have much effect on the noop
benchmark as illustrated below:
Improvements
-
ocaml-multicore/eio#38
Rename Flow.write to Flow.copyThe code and documentation have been updated to rename
Flow.write
toFlow.copy
for better clarity. -
ocaml-multicore/eio#36
Use uring for acceptThe
enqueue_accept
function now usesUring.accept
along with the
effect Accept
. -
ocaml-multicore/eio#37
Performance improvementsOptimisation for
Eunix.free
and process completed events with
Uring.peek
for better performance results. -
ocaml-multicore/eio#48
SimplifySuspend
operationThe
Suspend
effect has been simplified by replacing the older
Await
andYield
effects with the code from Eio. -
ocaml-multicore/eio#52
Split Linux support out toeio_linux
libraryeunix
now has common code that is shared by different backends,
andeio_linux
provides a Linux io-uring backend. The tests and the
documentation have been updated to reflect the change. -
ocaml-multicore/eio#57
Reraise exceptions with backtracesAdded support to store a reference to a backtrace when a switch
catches an exception. This is useful when you want to reraise the
exception later. -
ocaml-multicore/eio#60
Simplify handling of completionsThe PR adds
Job
andJob_no_cancel
intype io_job
along with
additionalLog.debug
messages.
Cleanups
-
ocaml-multicore/eio#42
Merge fibreslib into eioThe
Fibreslib
code is now merged witheio
. You will now need to
openEio.Std
instead of openingFibreslib
. -
ocaml-multicore/eio#47
Clean up the network APIThe network APIs have been updated with few changes such as renaming
bind
tolisten
, replacingUnix.shutdown_command
with our own
type in Eio API, and replacingUnix.sockaddr
with a custom type. -
ocaml-multicore/eio#49
RemoveEio.Private.Waiters
andEio.Private.Switch
The
Eio.Private.Waiters
andEio.Private.Switch
modules have been
removed, and waiting is now handled using the Eio library. -
ocaml-multicore/eio#55
Some API and README cleanupsThe PR has multiple cleanups and documentation changes. The
README.md has been modified to useEio.Flow.shutdown
instead of
Eio.Flow.close
, and a Time section has been added. The
Eio.Network
module has been changed toEio.Net
. TheTime.now
andTime.sleep_until
methods have been added tolib_eio/eio.ml
.
Documentation
-
ocaml-multicore/eio#43
Add design note about determinismThe README.md documentation has been updated with few design notes
on Determinism. -
ocaml-multicore/eio#50
README improvementsUpdated README.md and added
doc/prelude.ml
for use with MDX.
Handling Cancellation
-
ocaml-multicore/eio#39
Allow cancelling accept operationsThe PR now supports cancelling the server accept and read
operations. -
ocaml-multicore/eio#40
Support cancelling the remaining Uring operationsThe cancellation request of
connect
,wait_readable
and
await_writable
Uring operations is now supported. -
ocaml-multicore/eio#44
Fix read-cancel testThe
ENOENT
value has been correctly fixed to use -2, and the
documentation for cancelling the read request has been updated. -
ocaml-multicore/eio#51
GettingEALREADY
from cancel is not an errorHandle
EALREADY
case inlib_eunix/eunix.ml
where an operation
got cancelled while in progress.
Sundries
-
ocaml-multicore/eventlog-tools#2
Add a pausetimes toolA
eventlog_pausetimes
tool has been added toeventlog-tools
that
takes a directory of eventlog files and computes the mean, max pause
times, as well as the distribution up to the 99.9th percentiles. For
example:ocaml-eventlog-pausetimes /home/engil/dev/ocaml-multicore/trace3/caml-426094-* name { "name": "name", "mean_latency": 718617, "max_latency": 33839379, "distr_latency": [191,250,707,16886,55829,105386,249272,552640,1325621,13312993,26227671] }
-
ocaml-multicore/kcas#9
Backoff withcpu_relax
The
Domain.Sync.{critical_section, wait_for}
have now been
replaced withDomain.Sync.cpu_relax
, which matches the
implementation with lockfree. -
ocaml-multicore/retro-httpaf-bench#10
Add Eio benchmarkThe Eio benchmark has now been added to the retro-httpaf-bench
GitHub repository. -
ocaml-multicore/retro-httpaf-bench#11
Do a recursive checkout in the CI buildThe
build_image.yml
workflow has been updated to perform a
recursive checkout of the submodules for the CI build. -
domainslib#29
Task stealing with Chase Lev dequesThe task-stealing Chase Lev deques for scheduling tasks across
domains is now merged, and shows promising results on machines with
128 CPU cores. -
ocaml-multicore/multicore-opam#55
Add 0.3.0 release of domainslibThe opam file for
domainslib.0.3.0
has been added to the
multicore-opam repository.
Benchmarking
Ongoing
-
ocaml-bench/sandmark-nightly#1
Cannot alter comparison input valuesThe
Timestamp
andVariant
fields in the dropdown option in the
parallel_nightly.ipynb
notebook get reset when recomputing the
whole workbook.
-
ocaml-bench/sandmark#230
Build for 4.13.0+trunk with dune.2.8.1The
ocaml-migrate-parsetree.2.2.0
andppxlib.0.22.2
packages are
now available for 4.13.0+trunk, and we are currently porting the
Irmin Layers benchmark in Sandmark from using Irmin 2.4 to 2.6. -
ocaml-bench/sandmark#231
View results for a set of benchmarks in the nightly notebooksA feature request to filter the list of benchmarks when using the
Sandmark Jupyter notebooks. -
ocaml-bench/sandmark#233
Update pausetimes_multicore to fit with the latest Multicore changesThe pausetimes are now updated for both the 4.12.0 upstream and
4.12.0 Multicore branches to use the new Common Trace Format
(CTF). The generated graphs for both the sequential and parallel
pausetime results are illustrated below:
-
ocaml-bench/sandmark#235
Update selected benchmarks as a set for baseline benchmarkThe baseline benchmark for comparison should only be one from the
user selected benchmarks in the Jupyter notebooks.
-
ocaml-bench/sandmark#236
Implement pausetimes support in sandmark_nightlyThe sequential and parallel pausetimes graph results need to be
implemented in the Sandmark nightly Jupyter notebooks. The results
are similar to the Figures 10 and 12 produced in the Retrofitting
Parallelism ont OCaml, ICFP 2020
paper. -
ocaml-bench/sandmark#237
Run sandmark_nightly on a larger machineThe testing of Sandmark nightly sequential and parallel benchmark
runs have been done on a 24-core machine, and we would like to
deploy the same on a 64+ core machine to benefit from the recent
improvements to Domainslib. -
ocaml-bench/sandmark#241
Switch to default Random moduleAn on-going discussion on whether to switch to using
Random.State
for the sequential Minilight, global roots micro-benchmarks and
Evolutionary Algorithm.
Completed
-
ocaml-bench/sandmark#232
num_domains
→num_additional_domains
The benchmarks have been updated to now use
num_additional_domains
, to be consistent with the naming in
Domainslib. -
ocaml-bench/sandmark#239
Port grammatrix to Task poolThe Multicore Grammatrix benchmark has now been ported to use
Domainslib Task pool. The time and speedup graphs are given below:
OCaml
Ongoing
-
ocaml/ocaml#10039
SafepointsThe PR is currently being testing and evaluated for both ARM64 and
PowerPC architectures, in particular, the branch relaxations applied
toIpoll
instructions.
Our thanks to all the OCaml users, developers and contributors in the community for their continued support to the project. Stay safe!