Multicore OCaml: July 2021

Welcome to the July 2021 Multicore OCaml monthly report! This month’s update along with the previous updates has been compiled by me, @ctk21, @kayceesrk and @shakthimaan. As August is usually a period of downtime in Europe, the next update may be merged with the September one in a couple of months (but given our geographically diverse nature now, if enough progress happens in August I’ll do an update).

The overall status of the multicore efforts are right on track: our contributions to the next OCaml release have been incorporated in 4.13.0~alpha2, and our focus remains on crushing incompatibilities and bugs to generate domains-only parallelism patches suitable for upstream review and release. As a lower priority activity, we continue to develop the experimental “effects-based” IO stack, which will feature in the upcoming virtual OCaml Workshop at ICFP in August 2021.

The 4.12.0+domains trees continue to see a tail of bugs being steadily fixed. After last month’s call, we saw a number of external contributors step up to submit fixes in addition to the multicore and core OCaml teams. We would like to acknowledge and thank them!

  • @emillon (Etienne Millon) for running the Jane Street core v0.14 test suite with 4.12.0+domains and sharing the test results (and finding a multicore GC edge case bug while at it).
  • @Termina1 (Vyacheslav Shebanov) for testing the compilation of batteries 3.30 with Multicore OCaml 4.12.0+domains.
  • @nbecker (Nils Becker) for reporting on parallel_map and parallel_scan for domainslib.
  • Filip Koprivec for identifying a memory leak when using flush_all with ocamlc with 4.12.0+domains.

All of these fixes, combined with some big-ticket compatibility changes (listed below) are getting me pretty close to using 4.12.0+domains as my daily OCaml opam switch of choice. I encourage you to also give it a try and report (good or bad) results on the multicore OCaml tracker. If these sorts of problems grab your attention, then Segfault Systems is hiring in India to work with @kayceesrk and the team there on multicore OCaml.

For benchmarking, the Jupyter notebooks for the Sandmark nightly benchmark runs have been updated, and we continue to test the Sandmark builds for the 4.12+ variants and 4.14.0+trunk. Progress has been made to integrate current-bench OCurrent pipeline with the Sandmark 2.0 -alpha branch changes to reproduce the current Sandmark functionality, which will allow GitHub PRs to be benchmarked systematically before being merged.

As always, the Multicore OCaml ongoing and completed tasks are listed first, which are then followed by the updates from the Ecosystem libraries. The Sandmark nightly build efforts, benchmarking updates and relevant current-bench tasks are then mentioned. Finally, the update on the upstream OCaml Safepoints PR is provided for your reference.

Multicore OCaml

Ongoing

CI Compatibility

  • ocaml-multicore/ocaml-multicore#602
    Inclusion of most of OCaml headers results in requiring pthread

    The inclusion of multiple nested header files requires pthread and
    the decompress testsuite fails.

  • ocaml-multicore/ocaml-multicore#607
    caml_young_end is not a value * anymore

    An inconsistency observed in the CI where caml_young_end is now a
    char * instead of value *.

Crashes

Package Builds

Upstream

  • ocaml-multicore/ocaml-multicore#573
    Backport trunk safepoints PR to multicore

    The Safepoints implementation is being backported to Multicore
    OCaml. The initial test results of running Sandmark on a large Xen2
    box are shown below:


  • ocaml-multicore/ocaml-multicore#617
    Some of the compatibility macros are not placed in the same headers as in upstream OCaml

    The introduction of a compatibility layer for GC statistics need to
    be consistent with trunk.

  • ocaml-multicore/ocaml-multicore#618
    Review io.c for thread-safety and add parallel tests

    The thread-safety fixes in io.c requires a review and additional
    tests need to be added for the same.

  • ocaml-multicore/ocaml-multicore#623
    Exposing caml_channel_mutex_* hooks

    A draft PR to support caml_channel_mutex_* interfaces from trunk
    to Multicore OCaml.

Sundries

Completed

Enhancements

  • ocaml-multicore/ocaml-multicore#601
    Domain better participants

    The 0(n_running_domains) from domain creation and the iterations
    0(Max_domains) from STW signalling have been removed.

  • ocaml-multicore/ocaml-multicore#605
    Eventog event for condition wait

    A new event has been added to indicate when a domain is blocked at
    Condition.wait. This is useful for debugging any imbalance in task
    distribution in domainslib.

Upstream

  • ocaml-multicore/ocaml-multicore#584
    Modernise signal handling

    The Multicore OCaml signals implementation is now closer to that of
    upstream OCaml.

  • ocaml-multicore/ocaml-multicore#600
    Expose a few more GC variables in headers

    The caml_young_start, caml_young_limit and caml_minor_heap_wsz
    variables have now been defined in the runtime.

  • ocaml-multicore/ocaml-multicore#612
    Make intern and extern work with Multicore

    The upstream changes to intern and extern have now been incorporated
    to work with the Multicore OCaml runtime.

Fixes

  • ocaml-multicore/ocaml-multicore$604
    Fix unguarded caml_skiplist_empty in caml_scan_global_young_roots

    A patch that fixes a locking bug with global roots observed on a Mac
    OS CI with parallel/join.ml.

  • ocaml-multicore/ocaml-multicore#621
    otherlibs: encode_terminal_status does not set all fields

    A minor fix for the error caused when moved from using
    caml_initialize_field to caml_initialize in otherlibs.

  • ocaml-multicore/ocaml-multicore#628
    In link_channel, channel->prev should be set to NULL

    A PR to fix the memory leak when using flush_all with ocamlc as
    reported by Filip Koprivec.

  • ocaml-multicore/ocaml-multicore#629
    Backtrace last exn is val unit

    A fix for the crash reported on running core’s test suite by
    clearing backtrace_last_exn to Val_unit in
    runtime/backtrace.c.

Ecosystem

Ongoing

  • ocaml-multicore/ocaml-uring#36
    Update to cstruct 6.0.1

    ocaml-uring is now updated to use Cstruct.shiftv with the upgrade
    to cstruct.6.0.1.

  • ocaml-multicore/domainslib#37
    parallel_map

    A request by @nbecker to provide a parallel_map function over
    arrays having the following signature:

    val parallel_map : Domainslib.Task.pool -> ('a -> 'b) -> 'a array -> 'b array
    
  • ocaml-multicore/domainslib#38
    parallel_scan rejects arrays not larger than pool size

    An “index out of bounds” exception is thrown for
    Task.parallel_scan with arrays not larger than the pool size as
    reported by @nbecker.

Completed

  • ocaml-multicore/eventlog-tools#4
    Add domain/condition_wait event

    The lib/consts.ml file in eventlog-tools now includes the
    domain/condition_wait event.

  • ocaml-multicore/domainslib#34
    Fix initial value accounting in parallel_for_reduce

    The initial value of parallel_for_reduce has been fixed so as to
    not be accounted multiple times.

Eio

The eio library provides an effects-based parallel IO stack for
Multicore OCaml.

Ongoing
  • ocaml-multicore/eio#68
    WIP: Add eio_luv backend

    A work-in-progress to use luv that provides OCaml/Reason bindings
    to libuv for a cross-platform backend for eio.

Completed
  • ocaml-multicore/eio#62
    Update to latest MDX to fix exception reporting

    Dune has been updated to 2.9 along with necessary changes for
    exception reporting with MDX.

  • ocaml-multicore/eio#63
    Update README

    A documentation update specifying the following steps required to
    manually pin the effects version of ppxlib and
    ocaml-migrate-parsetree.

    opam switch create 4.12.0+domains+effects --repositories=multicore=git+https://github.com/ocaml-multicore/multicore-opam.git,default
    opam pin add -yn ppxlib 0.22.0+effect-syntax
    opam pin add -yn ocaml-migrate-parsetree 2.1.0+effect-syntax
    
  • ocaml-multicore/eio#64
    Improvements to traceln

    Enhancements to traceln to make it an Effect along with changes to
    trace output and addition of tests.

  • ocaml-multicore/eio#65
    Add Flow.read_methods for optimised reading

    The addition of read_methods in the Flow module as a faster
    alternative to reading into a buffer.

  • ocaml-multicore/eio#66
    Allow cancelling waiting for a semaphore

    Update to lib_eio/semaphore.ml to allow cancel waiting for a
    semaphore.

  • ocaml-multicore/eio#67
    Add more generic exceptions

    The inclusion of generic exceptions to avoid depending on
    backend-specific exceptions. The tests have also been updated.

Benchmarking

Sandmark Nightly

Ongoing

  • ocaml-bench/sandmark-nightly#4
    Parallel notebook pausetimes graphing for navajo results throws an error

    The parallel Jupyter notebook for pausetimes throws a ValueError
    that needs to be investigated.

  • ocaml-bench/sandmark-nightly#5
    Status of disabled benchmarks

    The alt-ergo, frama-c, and js_of_ocaml benchmark results that
    were disabled from the Jupyter notebooks have to be tested with
    recent versions of Multicore OCaml.

  • ocaml-bench/sandmark-nightly#6
    Parallel scalability number on navajo look odd

    The parallel performance numbers on the navajo build server for
    scalability will need to be reviewed and the experiments repeated
    and validated.

  • ocaml-bench/sandmark-nightly#7
    Use col_wrap as 3 instead of 5 in the normalised results in parallel notebook

    For better readability, it is recommended to use col_wrap as 3 in
    the normalised results in the parallel notebook.

  • ocaml-bench/sandmark-nightly#8
    View results for a set of benchmarks in the nightly notebooks

    A feature request to filter benchmarks by name or by tags when used
    with Jupyter notebooks.

  • ocaml-bench/sandmark-nightly#9
    Static HTML pages for the recent results

    The benchmark results from the most recent build runs should be used
    to generate static HTML reports for review and analysis.

Completed

  • ocaml-bench/sandmark-nightly#2
    Timestamps are not sorted in the parallel_nightly notebook

    The listing of timestamps in the drop-down option is now sorted.

Sandmark-nightly-PR-2-Fix

Sandmark

Ongoing

  • ocaml-bench/sandmark#243
    Add irmin tree benchmark

    A request to add the Irmin tree.ml benchmark to Sandmark, including
    necessary dependencies and data files.

  • ocaml-bench/sandmark#245
    Add dune.2.9.0

    An update to dune.2.9.0 in order to build coq with Multicore OCaml
    on Sandmark.

  • ocaml-bench/sandmark#247
    Sandmark breaks on OCaml 4.14.0+trunk

    The Sandmark build for OCaml 4.14.0+trunk needs to be resolved as we
    begin upstreaming more Multicore OCaml changes.

  • ocaml-bench/sandmark#248
    coq fails to build

    The coq package is failing to build with 4.12.0+domains+effects
    with Sandmark on navajo server.

Completed

  • ocaml-bench/sandmark#233
    Update pausetimes_multicore to fit with the latest Multicore changes

    The Multicore pausetimes have now been updated for the 4.12.0
    upstream and 4.12.0 branches which now use the new Common Trace
    Format (CTF).

  • ocaml-bench/sandmark#235
    Update selected benchmarks as a set for baseline benchmark

    You now have the option to only filter from the user selected
    variants in the Jupyter notebooks.

  • ocaml-bench/sandmark#237
    Run sandmark_nightly on a larger machine

    The Sandmark nightly builds now run on a 64+ core machine to benefit
    from the improvements to Domainslib.

  • ocaml-bench/sandmark#240
    Add navajo specific parallel config.json file

    A navajo server-specific run_config.json file has been added to
    Sandmark to run Multicore parallel benchmarks.

  • ocaml-bench/sandmark#242
    Add commentary on grammatrix

    A documentation update for the grammatrix benchmark on customised
    task distribution via channels and the use of parallel_for.

  • ocaml-bench/sandmark#244
    Add chrt to pausetimes_multicore wrapper

    The use of chrt -r 1 in paramwrapper is required with
    pausetimes_multicore to use the taskset arguments.

  • ocaml-bench/sandmark#246
    Add trunk build to CI

    The .drone.yml file has now been updated to include 4.14.0+stock
    trunk build for the CI.

current-bench

Ongoing

  • ocurrent/current-bench#117
    Read stderr from the docker container

    We are able to run Sandmark-2.0 -alpha branch with current-bench
    now, and it is useful to view the error output when running with
    Docker containers.

  • ocurrent/current-bench#146
    Replicate ocaml-bench-server setup

    A request to dynamically pass the Sandmark benchmark target commands
    to current-bench in order to create pipelines.

OCaml

Completed

  • ocaml/ocaml#10039
    Safepoints

    The PR has been cherry-picked on 4.13 and finally merged with
    upstream OCaml.

We would like to thank all the OCaml users, developers and contributors in the community for their valuable time and support to the project. Stay safe and have a great summer if you are northern hemispherically based!

Acronyms

  • AFL: American Fuzzy Lop
  • CI: Continuous Integration
  • CTF: Common Trace Format
  • GC: Garbage Collector
  • GCC: GNU Compiler Collection
  • GTK: GIMP ToolKit
  • HTML: HyperText Markup Language
  • IO: Input/Output
  • OPAM: OCaml Package Manager
  • OS: Operating System
  • PR: Pull Request
  • STW: Stop The World
29 Likes

this made me curious, so I tried to build containers on 4.12+domains. It worked almost the first time, modulo a small GC-related fix for the test suite. :ok_hand:

4 Likes

TYSM! I wait for these from one month to the next
they’re always an interesting read, especially going through the issue/PR links and seeing the discussions. I admit many things go over my head but it gives me an appreciation to the work put into this nonetheless :pray:

6 Likes

Nice to hear! As a next step, I’d be interested to see if the test suite continues to pass if you run the sequential suite in parallel with multiple Domains. That’ll help shake out Stdlib safety bugs and any mutable global state that might be lurking in Containers itself.

Glad it’s of use! Don’t hesitate to post about something you’d like to gain more of an understanding of. Even if the developers can’t post a detailed reply, we can often point to a resource where you could learn more.

2 Likes

Could you explain to a casual user who haven’t looked into the internals of the multicore runtime, what is important to make atomic and what isn’t?

If I’m using libraries that use mutable data structures internally, should I take any precautions in multicore code?

Ok, as I understand now, it’s crucially important. :slight_smile:

I’ve done a quick experiment with soupault to enable processing pages in parallel (messy code ahead):

@@ -515,13 +515,22 @@ let main () =
       else Ok ()
     in
     (* Process normal pages and collect index data from them *)
-    let* index = Utils.fold_left
-      (fun acc p ->
-         let ie = _process_page [] widgets config settings p in
-         match ie with Ok None -> Ok acc | Ok (Some ie') -> Ok (ie' :: acc) | Error _ as err -> err)
-      []
-      page_files
+    let add_index_entry acc e =
+      match e with
+      | Ok None -> acc
+      | Ok (Some ie') -> (ie' :: acc)
+      | Error msg -> failwith msg
     in
+    let p = Domainslib.Task.setup_pool ~num_additional_domains:(2) in
+    let* index =
+      let page_array = Array.of_list page_files in
+      let acc = Atomic.make [] in
+      try Domainslib.Task.parallel_for p ~start:0 ~finish:((Array.length page_array) - 1)
+        ~body:(fun n -> let ie = _process_page [] widgets config settings page_array.(n) in let acc_new= add_index_entry (Atomic.get acc) ie in ignore (Atomic.set acc acc_new));
+        Ok (Atomic.get acc)
+      with Failure msg -> Error msg
+    in
+    let () = Domainslib.Task.teardown_pool p in

The result is:

  • Plugins fail randomly with non-existent syntax error because Lua-ML’s lexer/parser isn’t re-entrant.
  • The logs are ofteb garbled because the Logs library doesn’t account for thread safety either.

Now the question is: what’s the intended way to isolate such non-reentrant code?

3 Likes

If you have non-reentrant code, you need to figure out how much of it needs to be in a ‘critical section’ i.e. how much of it is really non-reentrant. You create a mutex for this purpose. Before calling the critical section code you’ll lock the mutex, and when it’s done, you’ll unlock it. It’s not very efficient - only one domain will be able to run the critical section code at a time - but it’s the way to make it safe.

1 Like

Will Module Lwt_mutex work for this purpose?

I missed the fact that there’s a mutex module! That would work indeed.

Also worth mentioning that you can combine Atomic.get and Atomic.set with Atomic.exchange.