Multicore OCaml: September 2020

Multicore OCaml: September 2020

Welcome to the September 2020 Multicore OCaml report! This update along with the previous
monthly
updates have been compiled by @shakthimaan, @kayceesrk and @avsm.

Big news this month is that the systhreads compatibility support PR has been merged, which means that Dune (and other users of the Thread module) can compile out of the box. You can now compile the multicore OCaml fork conveniently using the new opam compiler plugin (see announcement):

opam update
opam compiler create "ocaml-multicore/ocaml-multicore:no-effect-syntax"
eval $(opam env)

This selects the branch of multicore OCaml that omits the experimental effect syntax, and thus works with the existing ppx ecosystem. It’s quite fun opam installing ecosystem packages and seeing them operate out of the box at long last. There are still a few rough edges to the thread compatibility support (mainly at the C compatibility layer, such as registering external C threads with the GC), but these will be worked out in the coming weeks. We’d like to hear of any build failures you encounter in the opam universe with this: please report them on https://github.com/ocaml-multicore/ocaml-multicore/issues

A number of performance improvements to the multicore OCaml GC and the Sandmark benchmarking project have also been completed through September:

  • we have now included the Kronecker implementation from the Graph500 benchmarks to Sandmark
  • an n-queen benchmark addition is in progress
  • benchmark runs now provide a count of the OCaml symbols as a code size metric
  • work on building Tezos with multicore OCaml, and integration with the Sandmark
    benchmarking test suite has also begun.

We have also begun an effort to port Lwt to take advantage of parallelism via Lwt_preemptive. Code samples and test runs have been performed, and Sudha has written an introductory blog post about her early results. Note that this work doesn’t change the core behaviour of Lwt (a cooperative futures framework with no context switching between bind calls), but allows parallelism via explicit calls to background preemptive threads.

On the upstreaming efforts to OCaml, the 4.12 release will freeze earlier than usual in October, and so we finished submitting the last of the garbage collector colour changes and are aiming for the work on reliable safe points to go into OCaml 4.13. There have been a lot of runtime changes packed into 4.12 already, and so we will issue a call for testing when the release candidate of 4.12 is cut.

Onto the details of the PRs. As with the previous updates, the Multicore OCaml updates are listed first, which are then followed by the enhancements to the Sandmark benchmarking project. The upstream OCaml ongoing and completed updates are finally mentioned for your reference.

Multicore OCaml

Ongoing

  • ocaml-multicore/domainslib#17
    Implement channels using Mutex and Condition Variables

    The lib/chan.ml sources have been updated to implement channels
    using Mutex and Condition Variables, and a
    LU_decomposition_multicore.exe test has been added for the same.

  • ocaml-multicore/ocaml-multicore#381
    Reimplementating systhreads with pthreads

    This PR is actively being reviewed for the use of pthreads in
    Multicore OCaml. It introduces the Domain Execution Contexts (DEC)
    which allows multiple threads to run atop a domain.

  • ocaml-multicore/ocaml-multicore#394
    Changes to polling placement

    The polls placement is done at the start of the functions and on the
    back-edge of loops, instead of using Feely’s algorithm. This is a
    work-in-progress.

  • ocaml-multicore/ocaml-multicore#401
    Do not handle interrupts recursively

    A domain local variable is introduced to prevent handling of
    interrupts recursively.

  • ocaml-multicore/ocaml-multicore#402
    Split handle_gc_interrupt into handling remote and polling sections

    A caml_poll_gc_work is introduced that has information of GC work
    done previously in caml_handle_gc_interrupt. This facilitates
    stw_handler to make calls to poll and not handle service
    interrupts, as it may lead to unwanted recursion.

  • ocaml-multicore/ocaml-multicore#403
    Segmentation fault when building Tezos on Multicore 4.10.0 with no-effects-syntax

    This is an on-going investigation on why the package
    tezos-embedded-protocol-packer in Tezos is causing a segmentation
    fault when building with Multicore OCaml.

Completed

Domainslib

  • ocaml-multicore/domainslib#19
    Finer grain signalling with mutex condvar for Channels

    The use of fine grain locking for Mutex and condition variables
    helps in improving the performance for larger cores, as against a
    single mutex for all the signalling.

Multicore OPAM

  • ocaml-multicore/multicore-opam#31
    Patch dune.2.7.1 for Multicore OCaml

    The opam file for dune.2.7.1 has been added along with a patch to
    bootstrap.ml to get it working for Multicore OCaml, thanks to
    Chaitanya Koparkar.

  • ocaml-multicore/multicore-opam#32
    Add ocamlfind-secondary dependency to dune

    The installation of dune requires ocamlfind-secondary as a
    dependency for dune.2.7.1, and has been added to the OPAM file.

Multicore OCaml

  • ocaml-multicore/ocaml-multicore#395
    Move to SPIN_WAIT for all spins and usleep in SPIN_WAIT

    The PR provides the SPIN_WAIT macro for all the busy spin wait
    loops, and uses caml_plat_spin_wait when busy waiting. This
    ensures that the same spin strategy is used in different places in
    the code.

  • ocaml-multicore/ocaml-multicore#397
    Relaxation of backup thread signalling

    The signalling to the backup thread from the mutator thread when
    leaving a blocking section is modified. It reduces the potential
    Operating System scheduling when re-entering OCaml.

  • ocaml-multicore/ocaml-multicore#400
    Demux eventlog for backup thread

    The events in the backup thread were emitting the same process ID as
    the main thread, and this PR separates them.

In the above illustration, the backup threads are active when the
main thread is waiting on a condition variable.

Benchmarking

Ongoing

  • ocaml-bench/sandmark#159
    Implement a better way to describe tasklet cpulist

    We need a cleaner way to obtain the taskset list of cores for a
    benchmark run when we are provided with a number of domains. We
    should be able to specify hyper-threaded cores, NUMA zones to use,
    and the specific cores to use for the parallel benchmarks.

  • ocaml-bench/sandmark#173
    Addition of nqueens benchmark to multicore-numerical

    A draft version of the classical n queens benchmark has been added
    for review in Sandmark. This includes both the single and multicore
    implementation.

Completed

  • ocaml-bench/ocaml_bench_scripts#11
    Add support for configure option and OCAMLRUNPARAM

    The ocaml_bench_scripts has been updated to support passing
    configure options and OCAMLRUNPARAM when building and running the
    benchmarks in Sandmark.

  • ocaml-bench/sandmark#122
    Measurements of code size

    The output .bench JSON file produced from the benchmarks now
    includes a code size metric for the number of CAML symbols. A sample
    benchmark output is shown below:

    {"name":"knucleotide.", ... ,"codesize":276859.0, ...}
    

    The code size count for few of the benchmarks is given below:

    | Benchmark  |   Count   |
    |------------|-----------|
    | alt-ergo   | 2_822_040 |
    | coqc       | 5_869_305 |
    | cpdf       | 1_131_376 |
    | nbody.exe  |   276_710 |
    | stress.exe |    84_061 |
    | fft.exe    |    38_914 |
    
  • ocaml-bench/sandmark#170
    Graph500 SEQ

    The Graph500 benchmark with a Kronecker graph generator has now been
    added to Sandmark. The generator builds three kernels for graph
    construction, Breadth First Search, and Single Source Shortest
    Paths.

  • ocaml-bench/sandmark#172
    Remove Base, Stdio orun dependency for trunk

    The orun sources in Sandmark have been updated to remove the
    dependency on both Base and Stdio. They have been replaced with
    functions from Stdlib, List, String and Str.

  • ocaml-bench/sandmark#174
    Cleanup our use of sudo for chrt

    The use of sudo has been removed from the Makefile for running
    parallel benchmarks, to avoid creating output files and directories
    that require root permissions for access. The use of
    RUN_BENCH_TARGET=run_orunchrt will execute the benchmarks using
    chrt -r 1. The user can give permissions to the chrt binary
    using:

    $ sudo setcap cap_sys_nice=ep /usr/bin/chrt
    

OCaml

Ongoing

  • ocaml/ocaml#9876
    Do not cache young_limit in a processor register

    The PR removes the caching of young_limit in a register for ARM64,
    PowerPC and RISC-V ports, as it is problematic during polling for
    signals and inter-domain communication in Multicore OCaml.

Completed

  • ocaml/ocaml#9756
    Garbage collectors colour change

    The gray colour scheme in the Garbage Collector has been removed to
    facilitate merging with the Multicore OCaml collector. The existing
    benchmarks in Sandmark suite that did overflow the mark stack are
    show in the below illustration, and there is little negative impact
    on the change.

As always, we would like to thank all the OCaml developers and users in the community for their continued support and contribution to the project. Be well!

Acronyms

  • ARM: Advanced RISC Machine
  • BFS: Breadth First Search
  • DEC: Domain Execution Context
  • GC: Garbage Collector
  • JSON: JavaScript Object Notation
  • NUMA: Non-Uniform Memory Access
  • OPAM: OCaml Package Manager
  • OS: Operating System
  • PR: Pull Request
  • RISC-V: Reduced Instruction Set Computing - V
  • SSSP: Single Source Shortest Path
38 Likes

Curious to know: How will merging multicore as a whole work? Until now PRs that are relevant for multicore are being merged gradually into the main OCaml repo. These PRs tend to be smallish mostly (with some exceptions).

But the multicore project has thousands of lines of new code. How will that code be broken up into separate and digestible chunks to get merged? If each of those PRs go through the traditional review process there could be further changes required in multicore itself. I understand everything is finely balanced so this could end up causing issues. Alternatively will will the multicore repo become the main repo (I guess that would be unlikely).

Generally curious how the “end game” will play out…

3 Likes

That’s a good question @sid. Once all the various architectural dependencies are in place, the GC itself is just a few standalone C files. The current plan is just to do focussed review from the core development team to that particular PR, with plenty of time in the development cycle to facilitate changes. The intention is to branch to OCaml 5.0 when the domains-only support lands, OCaml 4.x maintained as a longer term support branch while the 5.x series settles down. This is a major enough change that we are expecting some deviance from the release cadence of the past few years.

The multicore repo will definitely not become the main OCaml repo. We are reimplementing clean PRs for upstream OCaml, as the multicore repo history is long, storied and not especially useful.

It’s also just the beginning of the game :slight_smile: I’m very excited about some of the ongoing developments for post 5.0, such as fibres and effects. We’ll have more details (and paper drafts) on that as the research dust settles over the next few quarters.

13 Likes

All exciting news for sure!

Is it possible to create a “multicore#no-effect-syntax +static+musl” switch with the compiler plugin, so that I can distribute experimental builds to users?

1 Like

I guess the following is the correct syntax for opam-compiler (not # but :).

opam compiler create 'ocaml-multicore/ocaml-multicore:no-effect-syntax'

Thanks for spotting my error @nekketsuuu – I have now edited the original post to reflect your corrected syntax.

@dmbaturin, a musl switch is indeed possible: you are best off submitting a PR to https://github.com/ocaml-multicore/multicore-opam to add a variant package. @emillon (the author of opam-compiler) can comment on whether or not he plans to support custom opam remotes for compiler descriptions in his plugin. This might be a case where we could add a convenience CLI shortcut to the plugin for multicore.

Note that musl switches can be created using CC=musl-gcc opam switch create <regular-switch> (or CC=x86_64-linux-musl-gcc when using a complete musl toolchain, for example from https://musl.cc/). This should also work for opam compiler create.

Unfortunately multicore OCaml doesn’t currently support musl: https://github.com/ocaml-multicore/ocaml-multicore/issues/266

1 Like

That particular one does not seem possible because of the issue listed above, but yes generally that’s the kind of thing that we want to make easier with the plugin. I’d like to reuse a DSL such as this one to expose configure arguments so that you can ask for a +afl+fp compiler even if it’s not present in opam-repository (at the moment, the plugin does not use opam-repository at all).

That seems fixed now.

I am stoked for this! With liquidsoap, in particular, taking advantage of multi-core would improve performances so much!

The scheduler we developed for the application is a task-base thread-centric implementation pretty similar to what libev does with node. I’ve been waiting almost 10 years to be able to take advantage of this paradigm in OCaml… :sweat_smile:

There’s a test http server for it here: https://github.com/savonet/ocaml-duppy/blob/master/examples/http.ml. I’m really curious to see if it can finally scale up with multicore.

I’ve been able to compile it but, unfortunately, there seems to be some issues with mutexes try_lock call. I’ll file a report.

Thanks for the hard work, this is so exciting!

3 Likes