Welcome to the October 2020 multicore OCaml report, compiled by @shakthimaan, @kayceesrk and of course myself. The [previous monthly (https://discuss.ocaml.org/tag/multicore-monthly) updates are also available for your perusal.
OCaml 4.12.0-dev: The upstream OCaml tree has been branched for the 4.12 release, and the OCaml readiness team is busy stabilising it with the ecosystem. The 4.12.0 development stream has significant progress towards multicore support, especially with the runtime handling of naked pointers. The release will ship with a dynamic checker for naked pointers that you can use to verify that your own codebase is clean of them, as this will be a prerequisite for OCaml 5.0 and multicore compatibility. This is activated via the --enable-naked-pointers-checker
configure option.
Convergence with upstream and multicore trees: The multicore OCaml trees have seen significant robustness improvements as weâve converged our trees with upstream OCaml (possible now that the upstream architectural changes are synched with the requirements of multicore). In particular, the handling of global C roots is much better in multicore now as it uses the upstream OCaml scheme, and the GC colour scheme also exactly matches upstream OCamlâs. This means that community libraries from opam
work increasingly well when built with multicore OCaml (using the no-effects-syntax
branch).
Features: Multicore OCaml is also using domain local allocation buffers now to simplify its internals. We are also now working on benchmarking the IO subsystem, and support for CPU parallelism for the Lwt concurrency library has been added, as well as refreshing the new Asynchronous Effect-based IO (aeio) with Multicore OCaml, Lwt, and httpaf in an http-effects library.
Benchmarking: The Sandmark benchmarking test suite has additional configuration options, and there are new proposals in that project to leverage as much of the OCaml tools and ecosystem as much as possible.
As with previous updates, the Multicore OCaml ongoing, and completed tasks are listed first, which are followed by improvements to the Sandmark benchmarking test suite. Finally, the upstream OCaml related work is mentioned for your reference.
Multicore OCaml
Ongoing
-
ocaml-multicore/ocaml-multicore#422
Simplify minor heaps configuration logic and maskingThe PR is a step towards using Domain local allocation buffers. A
Minor_heap_max
size is used to reserve the minor heaps area, and
Is_young
for relying on a boundary check. TheMinor_heap_max
can
be overridden using OCAMLRUNPARAM environment variable. -
ocaml-multicore/ocaml-multicore#426
Replace global roots implementationAn effort to replace the existing global roots implementation to be
in line with OCamlâsglobroots
. The objective is to also have a
per-domain skip list, and a global orphans when a domain is
terminated. -
ocaml-multicore/ocaml-multiore#427
Garbage Collector colours change backportThe Garbage Collector colour scheme
changes in the major
collector have now been backported to Multicore OCaml. The
mark_entry
does not includeend
,mark_stack_push
resembles
closer to trunk, andcaml_shrink_mark_stack
has been adapted from
trunk. -
ocaml-multicore/ocaml-multicore#429
Fix a STW interrupt raceThe STW interrupt race in
caml_try_run_on_all_domains_with_spin_work
is fixed in this PR,
where theenter_spin_callback
andenter_spin_data
fields of
stw_request
are initialized after we interrupt other domains.
Completed
Systhreads support
-
ocaml-multicore/ocaml-multicore#381
Reimplementing Systhreads with pthreads (Domain execution contexts)The re-implementation of Systhreads with pthreads has been completed
for Multicore OCaml. The Domain Execution Context (DEC) is
introduced which allows multiple threads to run atop a domain. -
ocaml-multicore/ocaml-multicore#410
systhreads:caml_c_thread_register
andcaml_c_thread_unregister
The
caml_c_thread_register
andcaml_c_thread_unregister
functions have been reimported to systhreads. In Multicore OCaml,
threads created by C code will be registered to domain 0 threads
chaining.
Domain Local Storage
-
ocaml-multicore/ocaml-multicore#404
Domain.DLS.new_key takes an initialiserThe
Domain.DLS.new_key
now accepts an initialiser argument to
assign an associated value to a key, if not initialised
already. Also,Domain.DLS.get
no longer returns an option value. -
ocaml-multicore/ocaml-multicore#405
Rework Domain.DLS.get search function such that it no longer allocatesThe
Domain.DLS.get
has been updated to remove any memory
allocation, if the key already exists in the domain local
storage. The PR also changes thesearch
function to accept all
inputs as variables, instead of a closure from the environment.
Lwt
-
ocaml-multicore/multicore-opam#33
Add lwt.5.3.0+multicoreThe Lwt.5.3.0 concurrency library has been added to support CPU
parallelism with Multicore OCaml. A blog
post
introducing its installation and usage has been written by Sudha
Parimala. -
The Asynchronous Effect-based IO builds with a recent
Lwt, and the HTTP effects demo has been updated to work with
Multicore OCaml, Lwt, and httpaf. The demo source code is available
at the http-effects repo.
Sundries
-
ocaml-multicore/ocaml-multicore#406
Remove ephemeron usage of RPCThe inter-domain mechanism is not required with the stop-the-world
minor GC, and hence the same has been removed in the ephemeron
implementation. The PR also does clean up and simplifies the
ephemeron data structure and code. -
ocaml-multicore/ocaml-multicore#411
Fix typo for presume and presume_arg ininternal_variable_names
A minor typo bug fix to rename
Presume
andPresume_arg
in
internal_variables_names.ml
. -
ocaml-multicore/ocaml-multicore#414
Fix upPpoll
semantics_of_primitives
entryThe
semantics_of_primitives
entry forPpoll
has been fixed which
was causing flambda builds to remove poll points. -
ocaml-multicore/ocaml-multicore#416
Fix callback effect bugThe PR fixes a bug when the C-to-OCaml callback prevents effects
crossing a C callback boundary. The stack parent is cleared before a
callback, and restored afterwards. It also makes the stack parent a
local root, so that the GC can see it inside the callback.
Benchmarking
Ongoing
Configuration
-
ocaml-bench/ocaml-bench-scripts#12
Add support for parallel multibench targets and JSON inputThe
RUN_CONFIG_JSON
andBUILD_BENCH_TARGET
variables are now
added and passed during run-time for the execution of parallel
benchmarks. Default values are specified so that the serial
benchmarks can still run without explicitly requiring the same. -
ocaml-bench/sandmark#180
Notebook Refactoring and User changesA refactoring effort is underway to make the parallel benchmark
interactive. The user accounts on The Littlest JupyterHub
installation have direct access to the benchmark results produced
fromocaml-bench-scripts
on the system. -
ocaml-bench/sandmark#189
Add environment support for wrapper in JSON configuration fileThe OCAMLRUNPARAM is now passed as an environment variable to the
benchmarks during runtime, so that, different parameter values can
be used to obtain multiple results for comparison. The use case and
the discussion are available at Running benchmarks with varying
OCAMLRUNPARAM
issue. The environment variables can be specified in the
run_config.json
file, as shown below:{ "name": "orun_2M", "environment": "OCAMLRUNPARAM='s=2M'", "command": "orun -o %{output} -- taskset --cpu-list 5 %{command}" }
Proposals
-
ocaml-bench/sandmark#159
Implement a better way to describe tasklet cpulistThe discussion to implement a better way to obtain the taskset list
of cores for a benchmark run is still in progress. This is required
to be able to specify hyper-threaded cores, NUMA zones, and the
specific cores to use for the parallel benchmarks. -
ocaml-bench/sandmark#179
[RFC] Classifying benchmarks based on running timeA proposal to categorize the benchmarks based on their running time
has been provided. The following classification types have been
suggested:-
lt_1s
: Benchmarks that run for less than 1 second. -
lt_10s
: Benchmarks that run for at least 1 second, but, less than 10 seconds. -
10s_100s
: Benchmarks that run for at least 10 seconds, but, less than 100 seconds. -
gt_100s
: Benchmarks that run for at least 100 seconds.
The PR for the same is available at Classification of
benchmarks. -
-
We are exploring the use of
opam-compiler
switch environment to
build the Sandmark benchmark test suite. The merge of systhreads
compatibility
support
now enables us to install dune natively inside the switch
environment, along with the other benchmarks. With this approach, we
hope to modularize our benchmarking test suite, and converge to
fully using the OCaml tools and ecosystem.
Sundries
-
ocaml-bench/sandmark#181
Lock-free map benchAn implementation of a concurrent hash-array mapped trie that is
lock-free, and is based on Prokopec, A. et. al. (2011). This
cache-aware implementation benchmark is currently under review. -
ocaml-bench/sandmark#183
Use crout_decomposition name for numerical analysis benchmarkA couple of LU decomposition benchmarks exist in the Sandmark
repository, and this PR renames the
numerical-analysis/lu_decomposition.ml
benchmark to
crout_decomposition.ml
. This is to address Rename
lu_decomposition benchmark in
numerical-analysis
any naming confusion between the two benchmarks, as their
implementations are different.
Completed
-
ocaml-bench/sandmark#177
Display raw baseline numbers in normalized graphsThe raw baseline numbers are now included in the normalized graphs
in the sequential notebook output. The graph formaxrsskb
, for
example, is shown below:
-
ocaml-bench/sandmark#178
Change to new Domain.DLS API with InitializerThe
multicore-minilight
andmulticore-numerical
benchmarks have
now been updated to use the new Domain.DLS API with initializer. -
ocaml-bench/sandmark#185
Clean up existing effect benchmarksThe PR ensures that the code compiles without any warnings, and adds
amulticore_effects_run_config.json
configuration file, and a
run_all_effect.sh
script to execute the same. -
ocaml-bench/sandmark#186
Very simple effect microbenchmarks to cover code pathsA set of four microbenchmarks to test the throughput of our effects
system have now been added to the Sandmark test suite. These include
effect_throughput_clone
,effect_throughput_val
,
effect_throughput_perform
, andeffect_throughput_perform_drop
. -
ocaml-bench/sandmark#187
Implementation of ârecursionâ benchmarks for effectsA collection of recursion benchmarks to measure the overhead of
effects are now included to Sandmark. This is inspired by the
(Manticore
benchmarks)[https://github.com/ManticoreProject/benchmark/].
OCaml
Ongoing
-
ocaml/ocaml#9876
Do not cache young_limit in a processor registerThe PR removes the caching of
young_limit
in a register for ARM64,
PowerPC and RISC-V ports. The Sandmark benchmarks are presently
being tested on the respective hardware. -
ocaml/ocaml#9934
Prefetching optimisations for sweepingThe Sandmark benchmarking tests were performed for analysing a
couple of patches that optimisesweep_slice
, and for the use of
prefetching. The objective is to reduce cache misses during GC.
Completed
-
ocaml/ocaml#9947
Add a naked pointers dynamic checkerThe check for ânaked pointersâ (dangerous out-of-heap pointers) is
now done in run-time, and tests for the three modes: naked pointers,
naked pointers and dynamic checker, and no naked pointers have been
added in the PR. -
ocaml/ocaml#9951
Ensure that the mark stack push optimisation handles naked pointersThe PR adds a precise check on whether to push an object into the
mark stack, to handle naked pointers.
We would like to thank all the OCaml users and developers in the community for their continued support, reviews and contribution to the project.
Acronyms
- AEIO: Asynchronous Effect-based IO
- API: Application Programming Interface
- ARM: Advanced RISC Machine
- CPU: Central Processing Unit
- DEC: Domain Execution Context
- DLS: Domain Local Storage
- GC: Garbage Collector
- HTTP: Hypertext Transfer Protocol
- JSON: JavaScript Object Notation
- NUMA: Non-Uniform Memory Access
- OPAM: OCaml Package Manager
- OS: Operating System
- PR: Pull Request
- RISC-V: Reduced Instruction Set Computing - V
- RPC: Remote Procedure Call
- STW: Stop-The-World