Multicore OCaml: November 2021 with results of code review

Welcome to the November 2021 Multicore OCaml monthly report! This month’s update along with the previous updates have been compiled by me, @ctk21, @kayceesrk and @shakthimaan.

Core Team Code Review

In late November, the entire OCaml development team convened for a week-long code review and decision taking session on the multicore merge for OCaml 5.0. Due to the size of the patchset, we broke up the designs and presentations in five working groups. Here’s a summary of how each conversation went. As always, these decisions are subject to change from the core team as we discover issues, so please do not take any crucial decisions for your downstream projects on these. Our goal for publicising these is to hear about any corrections you might feel that we need to take on the basis of additional data that you might have from your own codebases.

For the purposes of brevity, we do not include the full minutes of the developer meetings. Overall, the multicore patches were deemed to be broadly sound and suitable, and we recorded the important decisions and tasks:

  • Pre-MVP: Tasks that need to be done before we make the PR to ocaml/ocaml in the coming month.
  • Post-MVP for 5.00: Tasks that need to be done on ocaml/ocaml before 5.00 release. These tasks will block the OCaml 5.00 release.
  • Post-5.00: Future looking tasks after 5.00 is released in early/mid-2022.

WG1: Garbage Collector

The multicore runtime alters the memory allocation and garbage collector to support multiple parallel threads of OCaml execution. It utilizes a stop-the-world parallel minor collector, a StreamFlow like multithreaded allocator and a mostly-concurrent major collector.

WG1 decided that compaction will not be in the 5.0 initial release, as our best fit allocator has shown that a good memalloc strategy obviates the need for expensive compaction. Of course, the multicore memory allocator is different from bestfit, so we are in need of community input to ensure our hypothesis involving not requiring compaction is sound. If you do see such a use case of your application heap becoming very fragmented when 5.0 is in beta, please get in touch.

Pre-MVP

  • remove any traces of no-naked-pointers checker as it is irrelevant in the pagetable-less multicore runtime.
  • running make parallel for the testsuite should work
  • move from assert to CAMLassert
  • How to do safepoints from C: add documentation on caml_process_pending_actions and a testsuite case for long-running C bindings to multicore
  • adopt the ephemeron bucket interface and do the same thing as 4.x OCaml trunk
  • check and document that NOT_MARKABLE can be used for libraries like ancient that want out of heap objects
  • check that we document what type of GC stats we return (global vs domain local) for the various stats

Post-MVP for 5.00

  • mark stack overflow fix, which shouldn’t affect most runtime allocation profiles

Post-5.00

  • statmemprof implementation
  • mark pre-fetching
  • investigate alternative minor heap implementations which maintain performance but cut virtual memory usage

WG2: Domains

Each domain in multicore can execute a thread of OCaml in parallel with other domains. Several additions are made to OCaml to spawn new domains, join domains that are terminating and provide domain local storage. There is a stdlib module Domain and the underlying runtime domain structures. A significant simplification in recent months is that the standard Mutex/Channel/Semaphore modules can be used instead of lower-level synchronisation primitives that were formerly available in Domain.

The challenge for the runtime structures is to accurately maintain the set of domains that must take part in stop-the-world sections in the presence of domain termination and spawning, as well as ensuring that a domain services stop-the-world requests when the main mutator is in a blocking call; this is handled using a backup thread signaled from caml_enter_blocking_section / caml_leave_blocking_section.

The multicore OCaml memory model was discussed, and the right scheme selected for arm64 (Table 5b from the paper). The local data race freedom (LDRF) property was agreed to be a balanced and predictable approach for a memory model for OCaml 5.0. We do likely need to depend on >C11 compiler for relaxed atomics in OCaml 5.0, so this will mean dropping Windows MSVC support for the MVP (but mingw will work).

Pre-MVP

  • Make domain id abstract and provide string_of_id
  • Document that initializing writes are ok using the Field macro with respect to the memory model. Also highlight that all writes need to use caml_modify (even immediates)
  • check that the selectgen ‘coeffect’ is correct for DLS.get
  • More comments needed for domain.c to help the reader:
    • around backup thread state machine and where things happen
    • domain spawn/join
  • comment/check why install_backup_thread is called in spawnee and spawner
  • check the reason why domain terminate is using a mutex for join (rather than a mutex, condvar pair)

Post-5.00

  • Provide a mechanism for the user to retrieve the number of processors available for use. This can be implemented by libraries as well.
  • add atomic mutable record fields
  • add arrays of atomic variables

WG3: Runtime multi-domain safety

Multicore OCaml supports systhreads in a backwards compatible fashion. The execution model remains the same, except transposed to domains rather than a single execution context.

Each domain will get its own threads chaining: this means that while only one systhread can execute at a time on a single domain (akin to trunk), many domains can still execute in parallel, with their systhreads chaining being independent. To achieve this, a thread table is employed to allow each domains to maintain their own independent chaining. Context switching now involves extra care to handle the backup thread. The backup thread takes care of GC duties when a thread is currently in a blocking section. Systhreads needs to be careful about when to signal it.

The tick thread, used to periodically force thread preemption, has been updated to not rely on signals (as the multicore signaling model does not allow this to be done efficiently). Instead, we rely on the interrupt infrastructure of the multicore runtime and trigger an “external” interrupt, that will call back into systhreads to force a yield.

The existing Dynlink API was designed decades ago for a web browser written in OCaml (called “mmm”) and is stateful. We’ll make it possible to call concurrently in the OCaml 5.0 MVP, but the WG3 decided to start redesigning the Dynlink API to be less stateful.

Code fragments are now stored in a lockfree skiplist to allow multiple threads to work on the codefrags structures concurrently in a thread-safe manner. Extra care is required on cleanup (i.e, freeing unused code fragments entries): this should only happen on one domain, and this is done at the end of a major cycle. For the interested, ocaml-multicore#672 is a recommended read to see the concurrent skiplist structure now used.

Signals in multicore have the following behaviour, with the WG3 deciding to change their behaviour to allow coalescing multiple signals from the perspective of the mutator:

  • A program with a single domain should have mostly the same signal behaviour as trunk. This includes the delivery of signals to systhreads on that domain.
  • Programs with multiple domains treat signals in a global fashion. It is not possible to direct signals to individual domains or threads, other than the control through thread sigmask. A domain recording a signal may not be the one executing the OCaml signal handler.

Frame descriptors modifications are now locked behind a mutex to avoid races if different threads were to try to apply changes to the frame table at the same time. Freeing up old frame tables is done at the end of a major cycle (which is a STW section) in order to be sure that no thread will be using this old frame table anymore.

Multicore OCaml contains a version of eventlog that is safe for multiple domains. It achieves this by having a separate CTF file per domain but this is an interim solution. We hope to replace this implementation with an existing prototype based on per-domain ring buffers which can be consumed programmatically from both OCaml and C. This will be a generalisation of eventlog, and so we should be able to remove the existing interface if it’s not widely adopted yet.

Pre-MVP

  • Rewrite intern.c so that it doesn’t do GC. This code is performance sensitive as the compiler reads the cmi files by unmarshaling them.
  • Ensure the m->waiters atomics in systhreads are correct and safe.
  • Write down options for Thread.exit to be discussed during or after merge, and what to do if just one domain exits while others continue to run. Should not be a blocking issue. Changing semantics is ok from vanilla trunk.
  • m->busy is not atomic anymore as of ocaml-multicore/ocaml-multicore#740, should be reviewed and merged.
  • Restrict Dynlink to domain 0 as it is a mutable interface and difficult to use concurrently.
  • Signals stack should move from counting to coalescing semantics.
  • Try to delay signal processing at domain spawn so that Caml_state is valid.
  • Remove total_signals_pending if possible.

Post-MVP for 5.00

  • Probe opam for eventlog usage (introduced in OCaml 4.13) to determine if removing it will break any applications.
  • Eventring merge is OK, eventlog API can be changed if functionality remains equivalent.
  • (could be post 5.00 as well) TLS for systhreads.

Post-5.00

  • Get more data on Dynlink usage and design a new API that is less stateful.
  • @xavierleroy suggested redesigning marshalling in light of the new allocator.

WG4: Stdlib changes

The main guiding principle in porting the Stdlib to OCaml 5.00 is that

  1. OCaml 5.00 does not provide thread-safety by default for mutable data structures and interfaces.
  2. OCaml 5.00 does ensure memory-safety (no crashes) even when stdlib is used in parallel by multiple domains.
  3. Observationally pure interfaces remain so in OCaml 5.00.

For OCaml libraries with specific mutable interfaces (e.g. Queue, Hashtbl, Stack, etc.) they will not be made domain-safe to avoid impacting sequential performance. Programs using parallelism will need to add their own lock safety around concurrent access to such modules. Modules with top-level mutable state (e.g. Filename, Random, Format, etc…) will be made domain-safe. Some, such as Random, are being extensively redesigned to use new approaches such as splittable prngs. The motivation for these choices and further discussion is found in the Multicore OCaml wiki page.

The WG4 also noted that we would accept alternative versions of mutable stdlib modules that are concurrent-safe (e.g. have a Concurrent.Hashtbl), and also hopes to see more lockfree libraries developed independently by the OCaml community. Overall, WG4 recognised the importance of community involvement with the process of porting OCaml libraries to parallel safety. We aim to add ocamldoc tags to the language to mark modules/functions safety, and hope to get this in the new unified package db at v3.ocaml.org ahead of OCaml 5.0.

Lazy

Lazy values in OCaml allow deferred computations to be run by forcing them. Once the lazy computation runs to completion, the lazy is updated such that further forcing fetches the result from the previous forcing. The minor GC also short-circuits forced lazy values avoiding a hop through the lazy object. The implementation of lazy uses unsafe operations from the Obj module.

The implementation of Lazy has been made thread-safe in OCaml 5.00. For single-threaded use, the Lazy module preserves backwards compatibility. For multi-threaded use, the Lazy module adds synchronization such that on concurrent forcing of an unforced lazy value from multiple domains, one of the domains will get to run the deferred computation while the other will get a new exception RacyLazy .

Random

With ocaml-multicore#582, we have domain-local PRNGs following closely along the lines of stock OCaml. In particular, the behaviour remains the same for sequential OCaml programs. But the situation for parallel programs is not ideal. Without explicit initialisation, all the domains will draw the same initial sequence.

There is ongoing discussion on splittable PRNGs in ocaml/RFCs#28, and a re-implementation of Random using the Xoshiro256++ PRNG in ocaml/ocaml#10701.

Format

The Format module has some hidden global state for implementing pretty-printing boxes. While the module has explicit API for passing the formatter state to the functions, there are predefined formatters for stdout , stderr and standard buffer, whose state is maintained by the module.

The Format module has been made thread-safe for predefined formatters. We use domain-local versions of formatter state for each domain, lazily switching to this version when the first-domain is spawned. This preserves the performance of single-threaded code, while being thread-safe for multi-threaded use case. See the discussion in ocaml/ocaml#10453 for a summary.

Mutex, Condition, Semaphore

The Mutex, Condition and Semaphore modules are the same as systhreads in stock OCaml. They now reside in stdlib . When systhreads are linked, the same modules are used for synchronization between systhreads.

Pre-MVP

  • Mark lazy as not thread safe.
    • Unify RacyLazy and Undefined
    • Remove domain-local unique token
    • Remove try_force
  • Add the Bucket module for ephemerons with a default sequential implementation as seen in OCaml 4.13.

Post-MVP for 5.00

  • Introduce ocamldoc tags for different concurrency safety
    • domain-safe
    • systhread-safe
    • fiber-safe
    • not-concurrency-safe (= !domain-safe || !systhread-safe || !fiber-safe) – also used as a placeholder for libraries and functions not analysed for concurrency.
  • Add documentation for memory model in the manual. Specifically, no values out of thin air – no need to precisely document the memory model aside from pointing to paper.
  • For Arg module, deprecate current but not whole module
  • remove ThreadUnix as a simple module that would no longer need Unix.
  • Dynlink should have a mutex inside it to ensure it doesnt crash especially in bytecode.

Post-5.00

  • Atomic arrays
  • Ephemerons reimplemented in terms of Bucket module.
  • Make disjoint the update of the lazy tag and marking by using byte-sized write and CAS.

WG5: Fibers

Fibers are the runtime system mechanism that supports effect handlers. The design of effect handlers in OCaml has been written up in the PLDI 2021 paper.The motivation for adding effect handlers and some more examples are found in these slides.

Programming with effect handlers

Effect handlers are made available to the OCaml programmer from stdlib/effectHandlers.ml (although this will likely be renamed Effect soon). The EffectHandlers module exposes two variants of effect handlers – deep and shallow. Deep handlers are like folds over the computation tree whereas shallow handlers are akin to case splits. With deep handlers, the handler wraps around the continuation, whereas in shallow handlers it doesn’t.

Here is an example of a program that uses deep handlers to model something analogous to the Reader monad.

open EffectHandlers
open EffectHandlers.Deep

type _ eff += Ask : int eff

let main () =
  try_with (fun _ -> perform Ask + perform Ask) ()
  { effc = fun (type a) (e : a eff) ->
      match e with
      | Ask -> Some (fun (k : (a,_) continuation) -> continue k 1)
      | _ -> None }

let _ = assert (main () = 2)

Observe that when we resume the continuation k , the subsequent effects performed by the computation are also handled by the same handler. As opposed to this, for the shallow handler doesn’t. For shallow handlers, we use continue_with instead of continue.

open EffectHandlers
open EffectHandlers.Shallow

type _ eff += Ask : int eff

let main () =
  let rec loop (k: (int,_) continuation) (state : int) =
    continue_with k state
    { retc = (fun v -> v);
      exnc = (fun e -> raise e);
      effc = fun (type a) (e : a eff) ->
        match e with
        | Ask -> Some (fun (k : (a, _) continuation) -> loop k 1)
        | _ -> None }
  in
  let k = fiber (fun _ -> perform Ask + perform Ask) in
  loop k 1

let _ = assert (main () = 2)

Observe that with a shallow handler, the recursion is explicit. Shallow handlers makes it easier to encode cases where state needs to be threaded through. For example, here is a variant of the State handler that encodes a counter:

open EffectHandlers
open EffectHandlers.Shallow

type _ eff += Next : int eff

let main () =
  let rec loop (k: (int,_) continuation) (state : int) =
    continue_with k state
    { retc = (fun v -> v);
      exnc = (fun e -> raise e);
      effc = fun (type a) (e : a eff) ->
        match e with
        | Next -> Some (fun (k : (a, _) continuation) -> loop k (state + 1))
        | _ -> None }
  in
  let k = fiber (fun _ -> perform Next + perform Next) in
  loop k 0

let _ = assert (main () = 3)

While this encoding is possible with deep handlers (by the usual State monad trick of building up a computation using a closure), it feels more natural with shallow handlers. In general, one can easily encode deep handlers using shallow handlers, but going the other way is challenging. With the typed effects work currently in development, the default would be shallow handlers and deep handlers would be encoded using the shallow handlers.

As a bit of history, the current implementation is tuned for deep handlers and has gathered optimizations over several iterations. If shallow handlers becomes more widely in the coming years, it may be possible to put in some tweaks that removes a few allocations. That said, the semantics of the deep and shallow handlers in this future implementation will remain the same as what is currently in OCaml 5.00 branch.

Post-MVP for 5.00

  • Add ARM64 backend
  • Documentation on the usage of effect handlers.
  • Current stack size should be the sum of the stack sizes of the stack of fibers. Currently, it only captures the top fiber size.
    • This is not straight-forward as it seems. Resuming continuations attaches a stack. Should we do stack overflow checks there? I’d not, as this would make resuming continuations slower. One idea might be to only do the stack overflow check at stack realloc, which catches the common case.

Post-5.00

  • Add support for compiling with frame pointers.

The November activities

That wraps up the mammoth code review summary, and significant decisions taken. Overall, we are full steam ahead for generating an OCaml 5.0 PR, although we do have our work cut out for us in the coming months! Now we continue with our regular report on what else happened in November.The ecosystem is continuing to evolve, and there are significant updates to Eio, the Effects-based parallel IO for OCaml.

Lwt.5.5.0 has been released that supports dispatching pure computations to multicore domains. The Sandmark benchmarking has now been updated to build for 5.00, and the current-bench tooling is being improved to better track the performance analysis and upstream merge changes.

As always, the Multicore OCaml updates are listed first, which contain the upstream efforts, documentation changes, and PR fixes. This is followed by the ecosystem updates to Eio and Tezos. The Sandmark, sandmark-nightly and current-bench tasks are finally listed for your kind reference.

Multicore OCaml

Ongoing

Upstream

  • ocaml-multicore/ocaml-multicore#669
    Set thread names for domains

    A patch that implements thread naming for Multicore OCaml. It
    provides an interface to name Domains and Threads differently. The
    changes have now been rebased with check-typo fixes.

  • ocaml-multicore/ocaml-multicore#733
    Improve the virtual memory consumption on Linux

    An ongoing design discussion on orchestrating the minor heap
    allocations of domains for virtual memory performance, domain spawn
    and termination, performance and safety of Is_young runtime usage,
    and change of minor heap size using Gc set.

  • ocaml-multicore/ocaml-multicore#735
    Add caml_young_alloc_start and caml_young_alloc_end in minor_gc.c

    caml_young_alloc_start and caml_young_alloc_end are not present
    in Multicore OCaml, and they should be the same as young_start and
    young_end.

  • ocaml-multicore/ocaml-multicore#736
    Decompress testsuite fails 5.0 because of missing pthread link flag

    An undefined reference to pthread_sigmask has been observed when
    -lpthread is not linked when testing the decompress testsuite.

  • ocaml-multicore/ocaml-multicore#737
    Port the new ephemeron API to 5.00

    An API for immutable ephemerons has been
    merged in trunk, and
    the respective changes need to be ported to 5.00.

  • ocaml-multicore/ocaml-multicore#740
    Systhread lifecycle

    The PR addresses the systhreads lifecycle in Multicore and provides
    fixes in caml_thread_domain_stop_hook, Thread.exit and
    caml_c_thread_unregister.

  • ocaml-multicore/ocaml-multicore#742
    Minor tasks from asynchronous review

    A list of minor tasks from the asynchronous review for OCaml 5.00
    release. The major tasks will have their own GitHub issues.

  • ocaml-multicore/ocaml-multicore#745
    Systhreads WG3 comments

    The commit names should be self-descriptive, use of non-atomic
    variables is preferred, and we should raise OOM when there is a
    failure to allocate thread descriptors.

  • ocaml-multicore/ocaml-multicore#748
    WG3 move gen_sizeclasses

    The PR moves runtime/gen_sizeclasses.ml to
    tools/gen_sizeclasses.ml and fixes check-typo issues.

  • ocaml-multicore/ocaml-multicore#750
    Discussing the design of Lazy under Multicore

    A ongoing discussion on the design of Lazy for Multicore OCaml that
    addresses sequential Lazy, concurrency problems, duplicated
    computations, and memory safety.

  • ocaml-multicore/ocaml-multicore#753
    C API to pin domain to C thread?

    A question on how to design an API that would allow creating a
    domain “pinned” to an existing C thread, from C.

  • ocaml-multicore/ocaml-multicore#754
    Improvements to emit.mlp organization

    The preproc_fun function should be moved to a target-independent
    module, and all the prologue code needs to be emitted in one place.

  • ocaml-multicore/ocaml-multicore#756
    RFC: Generalize the Domain.DLS interface to split PRNG state for child domains

    The PR demonstrates an implementation for a “proper” PRNG+Domains
    semantics where spawning a domain “splits” the PRNG state.

  • ocaml-multicore/ocaml-multicore#757
    Audit stdlib for mutable state

    An issue tracker for the status of auditing stdlib for mutable
    state. OCaml 5.00 stdlib will have to guarantee both memory and
    thread safety.

Documentation

  • ocaml-multicore/ocaml-multicore#741
    Ensure copyright headers are formatted properly

    The copyright headers in the source files must be neatly formatted
    using the new format. If the old format already exists, then the
    author, institution details must be added as shown below:

    /**************************************************************************/
    /*                                                                        */
    /*                                 OCaml                                  */
    /*                                                                        */
    /*          Xavier Leroy and Damien Doligez, INRIA Rocquencourt           */
    /*          <author's name>, <author's institution>                       */
    /*                                                                        */
    /*   Copyright 1996 Institut National de Recherche en Informatique et     */
    /*     en Automatique.                                                    */
    /*   Copyright <first year written>, <author OR author's institution>     */
    /*   Included in OCaml under the terms of a Contributor License Agreement */
    /*   granted to Institut National de Recherche en Informatique et en      */
    /*   Automatique.                                                         */
    /*                                                                        */
    /*   All rights reserved.  This file is distributed under the terms of    */
    /*   the GNU Lesser General Public License version 2.1, with the          */
    /*   special exception on linking described in the file LICENSE.          */
    /*                                                                        */
    /**************************************************************************/
    
  • ocaml-multicore/ocaml-multicore#743
    Unhandled exceptions should render better error message

    A request to output informative Unhandled_effect <EFFECT_NAME>
    error message instead of Unhandled in the compiler output.

  • ocaml-multicore/ocaml-multicore#752
    Document the current Multicore testsuite situation

    A documentation update on how to run the Multicore OCaml
    testsuite. The steps are as follows:

    $ make world.opt
    $ cd testsuite
    $ make all-enabled
    
  • ocaml-multicore/ocaml-multicore#759
    Rename type variables for clarity

    The type variables in stdlib/fiber.ml have been updated for
    consistency and clarity.

Sundries

  • ocaml-multicore/ocaml-multicore#725
    Blocked signal infinite loop fix

    A monotonic recorded_signals_counter has been introduced to fix
    the possible loop in caml_enter_blocking_section when no domain
    can handle a blocked signal. A signals_block.ml callback test has
    also been added.

  • ocaml-multicore/ocaml-multicore#730
    ocamlopt raise a stack-overflow compiling aws-ec2.1.2 and color-brewery.0.2

    A “Stack overflow” exception raised while compiling aws-ec2.1.2
    with 4.14.0+domains+dev0.

  • ocaml-multicore/ocaml-multicore#734
    Possible segfault when a new domain is signalled before it can initialize the domain_state

    A potential segmentation fault caused when a domain created by
    Domain.spawn receives a signal before it can reach its main
    entrypoint and initialize thread local data.

  • ocaml-multicore/ocaml-multicore#738
    Assertion violation when an external function and the GC run concurrently

    An Assertion Violation error message is thrown when Z3 tries to
    free GC cleaned up objects in the get_unsat_core function.

  • ocaml-multicore/ocaml-multicore#749
    Potential bug on Forward_tag short-circuiting?

    A bug when short-circuiting Forward_tag on values of type
    Obj.forcing_tag. Short-circuiting is disabled on values of type
    Forward_tag, Lazy_tag and Double_tag in the minor GC.

Completed

Upstream

Documentation

  • ocaml-multicore/ocaml-multicore/744
    Make cosmetic change to comments in lf_skiplist

    The comments in runtime/lf_skiplist.c have been updated with
    reference to the paper by Willam Pugh on “Skip Lists”.

  • ocaml-multicore/ocaml-multicore#746
    Frame descriptors WG3 comments

    The copyright headers have been added to
    runtime/frame_descriptors.c and runtime/frame_descriptors.h.

  • ocaml-multicore/ocaml-multicore#747
    Fix check typo for sync files

    The check-typo errors for sync.c and sync.h have been fixed.

  • ocaml-multicore/ocaml-multicore#755
    More fixes for check-typo

    The check-typo fixes for otherlibs/unix/fork.c,
    runtime/finalise.c, runtime/gc_ctrl.c, runtime/Makefile and
    runtime/caml/eventlog.h have been merged.

Sundries

  • ocaml-multicore/ocaml-multicore#720
    Improve ephemerons compatibility with testsuite

    The PR fixes weaktest.ml and also imports upstream changes to make
    ephemerons work with infix objects.

  • ocaml-multicore/ocaml-multicore#731
    AFL: Segfault and lock resetting (Fixes #497)

    A fix to get AFL-instrumentation working again on Multicore
    OCaml. The PR also changes caml_init_domains to use
    caml_fatal_error consistently.

Ecosystem

Ongoing

  • ocaml-multicore/tezos#8
    ci.Dockerfile throws warning

    The numerics library which enforced c99 has been removed from
    Tezos, and hence this warning should not occur.

  • ocaml-multicore/domainslib#55
    setup_pool: option to exclude the current/first domain?

    The use case to not include the main thread as part of the pool is a
    valid request. The use of async_push can help with the same:

    (* main thread *)
    let pool = setup_pool ~num_additional_domains () in
    let promise = async_push pool initial_task in
    (* the workers are now executing the [initial_task] and
       its children. main thread is free to do its thing. *)
    ....
    (* when it is time to terminate, for cleanup, you may optionally do *)
    let res = await pool promise (* waits for the promise to resolve, if not already *)
    teardown_pool pool
    
  • ocaml-multicore/eio#91
    [Discussion] Object Capabilities / API

    An open discussion on using an open object as the first argument of
    every function, and to use full words and expressions instead
    network, file_systems etc.

Completed

Eio

  • ocaml-multicore/eio#86
    Update README to mention libuv backend

    The README.md file has been updated to mention that the library
    provides a generic backend based on libuv, that works on most
    platforms, and has an optimised backend for Linux using io-uring.

  • ocaml-multicore/eio#89
    Marking uring as vendored breaks installation

    The use of pin-depends for uring to avoid any vendoring
    installation issues with OPAM.

  • ocaml-multicore/eio#90
    Implicit cancellation

    A lib_eio/cancel.ml has been added to Eio that has been split
    out of Switch. The awaiting promises use the cancellation context,
    and many operations no longer require a switch argument.

  • ocaml-multicore/eio#92
    Update trace diagram in README

    The trace diagram in the README file has been updated to show two
    counting threads as two horizontal lines, and white regions
    indicating when each thread is running.

  • ocaml-multicore/eio#93
    Add Fibre.first

    The Fibre.first returns the result of the first fibre to finish,
    cancelling the other one. A tests/test_fibre.md file has also been
    added with this PR.

  • ocaml-multicore/eio#94
    Add Time.with_timeout

    The module Time now includes both with_timeout and
    with_timeout_extn functions to lib_eio/eio.ml.

  • ocaml-multicore/eio#95
    Track whether cancellation came from parent context

    A Cancelled exception is raised if the parent context is asked to
    exit, so as to propagate the cancellation upwards. If the
    cancellation is inside, the original exception is raised.

  • ocaml-multicore/eio#96
    Add Fibre.all, Fibre.pair, Fibre.any and Fibre.await_cancel

    The all, pair, any and await_cancel functions have been
    added to the Fibre module in libe_eio/eio.ml.

  • ocaml-multicore/eio#97
    Fix MDX warning

    The tests/test_fibre.md file has been updated to fix MDX warnings.

  • ocaml-multicore/eio#98
    Keep an explicit tree of cancellation contexts

    A tree of cancellation contexts can now be dumped to the output, and
    this is useful for debugging.

  • ocaml-multicore/eio#99
    Make enqueue thread-safe

    Thread-safe promises, streams and semaphores have been added to
    Eio. The make bench target can test the same:

    dune exec -- ./bench/bench_promise.exe
    Reading a resolved promise: 4.684 ns
    use_domains,   n_iters, ns/iter, promoted/iter
          false,  1000000,   964.73,       26.0096
           true,   100000, 13833.80,       15.7142
    
    dune exec -- ./bench/bench_stream.exe
    use_domains,  n_iters, capacity, ns/iter, promoted/iter
          false, 10000000,        1,  150.95,        0.0090
          false, 10000000,       10,   76.55,        0.0041
          false, 10000000,      100,   52.67,        0.0112
          false, 10000000,     1000,   51.13,        0.0696
           true,  1000000,        1, 4256.24,        1.0048
           true,  1000000,       10,  993.72,        0.2526
           true,  1000000,      100,  280.33,        0.0094
           true,  1000000,     1000,  287.93,        0.0168
    
    dune exec -- ./bench/bench_semaphore.exe
    use_domains,  n_iters, ns/iter, promoted/iter
          false, 10000000,   43.36,        0.0001
           true, 10000000,  303.89,        0.0000
    
  • ocaml-multicore/eio#100
    Propogate backtraces in more places

    The libe_eio/fibre.ml and lib_eio_linux/eio_linux.ml have been
    updated to allow propagation of backtraces.

Tezos

  • ocaml-multicore/tezos#10
    Fix make build-deps, fix NixOS support

    The patch fixes make build-deps/build-dev-deps, and conf-perl
    has been removed from the tezos-opam-repository.

  • ocaml-multicore/tezos#15
    Fix scripts/version.h

    The CI build failure is now fixed with proper exporting of variables
    in scripts/version.h file.

  • ocaml-multicore/tezos#16
    Fix make build-deps and make build-dev-deps to install correct OCaml switch

    The hardcoded OCaml switches have now been removed from the script
    file and the switch information from script/version.h is used with
    make build-deps and make build-dev-deps targets.

  • ocaml-multicore/tezos#17
    Enable CI on pull request to 4.12.0+domains branch

    CI has been enabled for pull requests for the 4.12.0+domains branch.

  • ocaml-multicore/tezos#20
    Upstream updates

    A merge of the latest upstream build, code and documentation changes
    from Tezos repository.

Sundries

  • ocaml-multicore/tezos-opam-repository#6
    Updates

    The dependency packages in the tezos-opam-repository have been
    updated, and mtime.1.3.0 has been added as a dependency.

  • ocaml-multicore/ocaml-uring#40
    Remove test dependencies on Bos and Rresult

    The Bos and Rresult dependencies have been removed from the
    project as we already depend on OCaml >= 4.12 which provides the
    required functions.

  • ocaml-multicore/ocaml-uring#42
    Handle race in test_cancel_late

    A race condition from test_cancel_late in tests/main.ml has been
    fixed with this merged PR.

  • ocaml-multicore/domainslib#51
    Utilise effect handlers

    The tasks are now created using effect handlers, and a new
    test_deadlock.ml test has been added.

Benchmarking

Sandmark and Sandmark-nightly

Ongoing

  • ocaml-bench/sandmark-nightly#21
    Add 5.00 variants

    Multicore OCaml now tracks OCaml trunk, and 4.12.0+domains+effects
    and 4.12+domains will only have bug fixes. The following variants
    are now required to be included in sandmark-nightly:

    • OCaml trunk, sequential, runtime (throughput)
    • OCaml 5.00, sequential, runtime
    • OCaml 5.00, parallel, runtime
    • OCaml trunk, sequential, pausetimes (latency)
    • OCaml 5.00, sequential, pausetimes
    • OCaml 5.00, parallel, pausetimes
  • ocaml-bench/sandmark#262
    ocaml-migrate-parsetree.2.2.0+stock fails to compile with ocaml.5.00.0+trunk

    The ocaml-migrate-parsetree dependency does not work with OCaml
    5.00, and we need to wait for the 5.00 AST to be frozen in order to
    build the package with Sandmark.

  • A package_remove feature is being added to the -main branch of
    Sandmark that allows to dynamically remove any dependency packages
    that are known to fail to build on recent development branches.

Completed

  • ocaml-bench/sandmark-nightly#22
    Fix dataframe intersection order issue

    The dataframe_intersection function has been updated to properly
    filter out benchmarks that are not present for the variants that are
    being compared.

  • ocaml-bench/sandmark#248
    Coq fails to build

    A new Coq tarball,
    coq-multicore-2021-09-24,
    is now used to build with Sandmark for the various OCaml variants.

  • ocaml-bench/sandmark#257
    Added latest Coq 2019-09 to Sandmark

    The Coq benchmarks in Sandmark now build fine for 4.14.0+domains and
    OCaml 5.00.

  • ocaml-bench/sandmark#260
    Add 5.00 branch for sequential run. Fix notebook.

    Sandmark can now build the new 5.00 OCaml variant to build both
    sequential and parallel benchmarks in the CI.

  • ocaml-bench/sandmark#261
    Update benchmark and domainslib to support OCaml 4.14.0+domains (OCaml 5.0)

    We now can build Sandmark benchmarks for OCaml 5.00, and the PR
    updates to use domainslib.0.3.2.

current-bench

Ongoing

  • ocurrent/current-bench#219
    Support overlay of graphs from different compiler variants

    At present, we are able to view the front-end graphs per OCaml
    version. We need to overlay graphs across compiler variants for
    better comparison and visualization.

  • ocurrent/current-bench#220
    Setup current-bench and Sandmark for nightly runs

    On a tuned machine, we need to setup current-bench (backend and
    frontend) for Sandmark and schedule nightly runs.

  • ocurrent/current-bench#221
    Support developer repository, branch and commits for Sandmark runs

    A request to run current-bench for developer branches on a nightly
    basis to visualize the performance benchmark results per commit.

Completed

  • ocurrent/curren-bench#106
    Use --privileged with Docker run_args for Multicore OCaml

    The current-bench master (3b3b31b...) is able to run Multicore
    OCaml Sandmark benchmarks in Docker without requiring the
    --privileged option.

  • ocurrent/current-bench#146
    Replicate ocaml-bench-server setup

    current-bench now supports the use of a custom bench.Dockerfile
    which allows you to override the TAG and OCaml variants to be used
    with Sandmark.

  • ocurrent/current-bench#190
    Allow selected projects to run on more than one CPU

    A OCAML_BENCH_MULTICORE_REPOSITORIES environment variable has been
    added to build projects on more than one CPU core.

  • ocurrent/current-bench#195
    Add instructions to start just frontend and DB containers

    The HACKING.md file has been updated with instructions to just start
    the frontend and database containers. This allows you to run
    benchmarks on any machine, and use an ETL script to dump the results
    to the database, and view them in the current-bench frontend.

We would like to thank all the OCaml users, developers and
contributors in the community for their continued support to the
project. Stay safe!

Acronyms

  • AFL: American Fuzzy Lop
  • API: Application Programming Interface
  • AST: Abstract Syntax Tree
  • AWS: Amazon Web Services
  • CI: Continuous Integration
  • CPU: Central Processing Unit
  • DB: Database
  • DLS: Domain Local Storage
  • ETL: Extract Transform Load
  • GC: Garbage Collector
  • IO: Input/Output
  • MD: Markdown
  • MLP: ML-File Preprocessed
  • OOM: Out of Memory
  • OPAM: OCaml Package Manager
  • OS: Operating System
  • PR: Pull Request
  • PRNG Pseudo-Random Number Generator
  • RFC: Request For Comments
  • WG: Working Group
42 Likes

2021 → 2022, I suspect?

Indeed, fixed! Although time is now dilating strangely this far into the pandemic…

7 Likes

I believe it was mentioned earlier that MSVC would not be in the MVP, which made sense when it was articulated. But since this C11 blocking issue was raised, is one of the WG2 Post-5.00 action items to add back in support for MSVC?

If I am reading the summary correctly, the C11 optional feature that WG2 relies on is the _Atomic type constructor keyword. That would seem to mean there are only two paths to get OCaml 5+ to work with MSVC … a) wait for Microsoft to add atomics to its existing C11 support or b) write constructor+accessor+mutator macros that use InterlockedAdd, etc. on Windows and atomic_fetch_add for supporting compilers. Option “a” is a very risky wait and option “b” can be very expensive to do if not done early in the coding. Do we have a good path forward past the MVP?

Thanks. Appreciate the comprehensive updates!

1 Like

Another course of action would be to use Clang, e.g. clang-cl which supports this feature, I think. It should support cl-compatible build options and work with MSVC-compatible build systems out of the box. Chrome and Firefox have both used it for the official Windows builds for years already.

1 Like

Option b is the plan - adding Windows at all to the MVP only happened because mingw-w64 has the required C11 support and the winpthreads exists. I expect MSVC support to be re-added prior to 5.00

In particular we are looking for examples of relevant benchmarks that let us measure the performance of Lazy: both existing (single-threaded) usages of Lazy, and potential usage in a multicore context to evaluate efficient synchronisation schemes. Feel free to contribute!

Sounds like great planning; thanks!

Learned something new. That may explain why clang-cl is an install option in Visual Studio Installer. Thanks.

We currently have ocaml 4.12.0+domains. It is able to compile a significant chunk of the current Ocaml opam ecosystem already. I am able to compile package with a lot of dependencies like the dream web framework. Things seem to work well already on the multicore compiler.

But from the Nov 2021 update above, it seems a lot of of work is still remaining.

Are things working now because the current opam ecosystem is basically running on only a single multicore domain? Is the bulk of the work remaining to make sure that things like stdlib and other stuff not handled by the multicore team so far (like dynlink) can work properly when they run in multiple domains?

Now eio is able to run in multiple domains as I understand it. How do I know which of the underlying stdlib methods will “break” if I use them in Eio? Does the stdlib that comes with the 4.12+multicore branch already have a somewhat “patched” stdlib that can work on multiple domains?

Essentially a short list of tips on what is OK and what should be avoided when working with Eio+ OCaml 4.12+domains will be appreciated.

Finally, I’ve been following the new ocamlnat/ocaml-jit project with some interest – I’m wondering if it would be a useful idea to (1) backport the patches mentioned here to 4.12+domains (2) Then try get ocamljit working there (and since dynlink is involved, only on a single domain).

(Disclaimer: I’m not a Multicore OCaml developer, so my understanding below is to be taken with a grain of salt.)

4.12.0+domains is an experimental fork of the compiler codebase, not an actual “OCaml release” containing multicore. Testing software with it is great to make the multicore implmentation more robust, but you probably shouldn’t plan long-term development/maintenance on it.

Right now there is a lot of effort towards merging the multicore fork back into the upstream OCaml distribution; the plan is to release this as 5.00. This version will be maintained in the long term (with new 5.x relases, etc.).

I don’t really see a point trying to rebase ocamljit stuff (or other experiments on top of the OCaml compiler codebase) on 4.12+domains when we hope to have a “real” 5.00 release soon.

Yes, exactly. Most of the existing code has not been tested to work well in presence of actual parallelism, and all the code that relies on global mutable state (which is most OCaml code) will have to be changed to support this well.

And the “best” way to do this is still not fully known yet (eio is interesting, but as far as I know this is an experimental library that is not yet seen a single release). The 5.00 release of the compiler distribution is aimed at getting the minimal stuff (domains for parallelism) working well, but it will not provide much user-level library support to write concurrent programs, only basic building blocks. This means that after the release, there will be a time for experimentation with various libraries, but it may take some time for standard practices to emerge to make the code concurrency-friendly.

Patience, as they say, is the mother of all virtues.

8 Likes

Shouldn’t the abstractions that eio is trying to implement make it into the stdlib? Otherwise we might end up with multiple libraries cooking the parallelism in different incompatible ways, a new Lwt/Async story.

This does hurt the ecosystem, take a look at this ocaml-kafka issue for example: modern Kafka API support gets added to Async flavour but not to Lwt one, because the PR author is using Async and does not have have extra time to spend on Lwt flavor as they don’t need it. Time is a valuable resource and none of us have it. I would probably take similar approach myself if I were to contribute some large changes to an Lwt/Async capable library - implement what I know and need and try to upstream that, or just fork it and iterate privately on the Lwt part to bring in what I need and unblock myself.

I see how baking event loop and io into stdlib makes Golang ecosystem very coherent, bits from different authors are composable and compatible, as rely on the same goroutines/event-loop-runtime.

1 Like

It would be nice to have eio in stdlib. But I wager that this might not be a good idea for several reasons right now:

  • Eio is still experimental and in-progress. There is a really a high threshold for getting something into the OCaml standard library. Because once it gets there, it effectively needs to be supported for a long time. From the above update there is plenty of work remaining for OCaml 5.00. Aiming to put something like Eio in stdlib will probably add more delays as the design would need to find consensus, be polished enough to get into stdlib etc. This way OCaml 5.00 and eio remain decoupled for maximum innovation.
  • No one knows what a “standard” effects based stdlib should truly look like because work on effects has been quite academic/experimental so far. Adding something like eio to ocaml stdlib risks freezing experimentation and innovation. Maybe in a future version of OCaml, once effects have been around for a while there could be something in stdlib that gives you a more “batteries included” experience?
  • The point of effects to some extent is to provide the language support and let people build things with different backends on it. You can have a MS windows asynchronous api backend, Linux uring, libuv, libev, poll/select/kqueue, system threads etc. etc. backend to perform your effects on. Each of them will have different advantages/disadvantages. These backends could be transparent to the user. For instance, the eio library has a common API for both the libuv and io_uring backend. You can switch between them trivially as long as you are using the common eio API and not something specific to io_uring/libuv. This is quite different with Lwt and Async which are “incompatible” in many ways with each other. The effect infrastructure makes it easier to abstract the backend in a much better way than we can today. The point here is that OCaml seems to have taken a decision not to bless a particular backend. In the declarative spirit of functional languages, effects make it easy to specify what to do. We can add a backend to specify how we want to do it.
  • The future of effects involves adding types to track effects. That is the next step in OCaml’s evolution. That might require some changes to effects that may be backward incompatible. Adding eio to the stdlib means (a) there would be a library based on untyped effects in stdlib – its design will add more inflexibility to future planned changes (b) OCaml core team may eventually want to have something based on typed effects in the stdlib (I don’t know, I’m guessing). Keeping something like eio out right now gives more room for future flexibility.

BTW please correct me if I’m wrong in anyway. This is just based on my understanding from learning about OCaml effects/multicore on this forum, papers and GitHub.

8 Likes

BTW this point of yours is well taken and this caused me to think some more about your question. In Golang one never needs to worry about the backend because Golang provides its own Goroutines multiplexed upon system threads core. This causes everything in the ecosystem to compose and not fragment like you very accurately said. The value of that is quite incalculable. With effects one could do specialized and fancy things but there is some value in unifying around a backend in OCaml.

OCaml traditional use case is not to write web servers but compilers, theorem provers, type checkers and other “industrial” software. Now OCaml 5.00 does provide a default backend in which effects can be executed and those are domains! Domains may appear quite old fashioned and insufficient when it comes to building web servers but will get you a long way. However, the problem of fragmentation will still remain when you are seeking more concurrency…

I think there is a lot more innovation that may happen in the future in OCaml in this space. It’s very early days yet. There could be some unification in the future but some amount of fragmentation will continue with effects.

1 Like

I agree that pulling ‘eio’ into stdlib tomorrow makes little sence, given that both eio and effects are pretty new. Probably some confirmation that eio is to be included into the list of core OCaml tools/libraries should help - yes, it’s still experimental, but experimentation will result in production grade concurrency library. dune is not in the stdlib and is not even distributed with OCaml compiler, yet I feel confident and safe when I use it to build all of my projects. dune is truly awesome on its own, but I believe that not its awesomeness alone pushed it to the dominating build system position, there was some official blessing and evangelism along with some migration efforts.

Lwt also supports multiple backends, but that does not help when I need a library, that gives me 'a Deferred.t while I can only consume 'a Lwt.t. Backend abstractions do not make libraries compatible on their own… Event loop libraries (libev, libuv) are typically monopolistic and expect only one owner within a process. Co-existing multiple event loops or having multiple copies of same event loop is usually more of a corner case, and will cause headaches for those who try to do that. IIRC there is no good recipe to integrate Lwt and Async in one OCaml program. Effects provide a solution to switch stacks, that allows to get rid of monadic wrapping for asynchronous values, but you still have to wrap all of your computation into something that knows how to handle certain effects and I’m not sure that multiple IO effects can be composed this way?

1 Like

I think the short answer is that we don’t know yet. Someone needs to go though the stdlib modules and other libraries and mark each one as thread-safe or not (using the new ocamldoc tags).

There are usually three cases:

  1. The module uses global mutable state and will crash if used from multiple domains (the aim is to fix such modules). e.g. Format used to fail but is now fixed.
  2. The module itself can be used from multiple domains, but values created cannot be shared. e.g. two domains can use Queue.create safely, but a single queue cannot be shared between domains. This will be the common case.
  3. The values themselves can be safely shared. e.g. Eio’s promises, streams and semaphores are supposed to be safe to share, and can be used to communicate between domains.

Also, note that the multi-domain support in Eio is very new. I’m still going through the code and fixing some things there.

4 Likes

OCaml is a great language for writing network-facing services, like web servers. It really shines when those are not just Flask-like APIs, but complex distributed systems with highly complex internal architecture. Rust certainly can do well in that field, but it’s causing some fatigue due to borrow checker that is not really performance-justified in many cases. Go can also do well in that field, but causes fatigue due to extreme lack of expressiveness and occasional nil-related bugs ruin the fun. Recent OCaml survey shows increasing interest in this field, and with multicore and decent concurrency story OCaml can attract a share of that market in cases when managed memory performance is acceptable, but nil bugs are not.

Native thread per HTTP request is definitely a no go in modern world and more concurrency will always be required. It would be really sad to have the fragmented ecosystem here in the long run :frowning:

Wow, looks like eio schedules all fibers to one domain by default and if user explicitly is willing to throw some fibers to other physical thread, they are able to do that. That’s fantastic! Golang is doing automatic scheduling of goroutines to physical threads, which is quite annoying as one has to mind races coming from physical threads literally everywhere. Opt-in threading allows the user to split the logic into asynchronous pieces each running on one physical thread (so no data races out of nowhere), and organize communication between those pieces in a thread-safe manner.

3 Likes

So here is an example of the advantage of OCaml’s customized approach benefitting you. Golang has chosen the defaults and those defaults are not ideal as you have explained.

That is the problem with defaults. Defaults improve sharing through standardization but defaults also hurt specialization. OCaml effects will allow you to build a very customized scheduler for your program.

I don’t want to build a scheduler and write bindings for all the libraries I need to make them work with that home-grown scheduler… If eio decides to make automatic balancing of fibers to cores, that would be unfortunate, but as long as that gives me all the libraries at a distance of opam install, I would live with that (and enjoy sporadic thread races)…

Looks like we’re getting closer and closer to bikeshedding territory.

Clarification: Here I didn’t mean “you” as the end programmer but rather “you” as the builder of a library such as eio. This is something that is not really possible in Golang as the Goroutines are baked in.

Yes I agree. Programmers will mostly just pickup something like eio and run with that. (And if you’re a big company and have very specific requirements you can then decide to build your own scheduler).

Thanks for the interesting discussion so far – I think we already agree on a lot of points. I’ll cease the back and forth in the interest of not lengthening this already long thread…

1 Like

It would be helpful if a set of common effects would be defined, though,
and each library can then come up with its own handlers. For example I’d
rather not have one type of IO channel per library, they should all use
the standard one (or a standard one), along with standard effects for
read/write/flush/…. I imagine it’s not going to be possible in practice,
but one can dream.

4 Likes