Multicore OCaml: January 2020 update

Welcome to the January 2020 news update from the Multicore OCaml team! We’re going to summarise our activites monthly to highlight what we’re working on throughout this year. This update has kindly been assembled by @shakthimaan and @kayceesrk.

The most common question we get is how to contribute to the overall multicore effort. As I noted last year, we are now in the process of steadily upstreaming our efforts to mainline OCaml. Therefore, the best way by far to contribute is to test for regressions or opportunities for improvements in the patches that are outstanding in the main OCaml repository.

A secondary benefit would be to review the PRs in the multicore repository, but those tend to be more difficult to evaluate externally as they are being spotted as a result of stress testing at the moment. A negative contribution would be to raise discussion of orthogonal features or new project management mechanisms – this takes time and effort to reply to, and the team has a very full plate already now that the upstreaming has begun. We don’t want to prevent those discussions from happening of course, but would appreciate if they were directed to the general OCaml bugtracker or another thread on this forum.

We’ll first go over the OCaml PRs and issues, then cover the multicore repository and our Sandmark benchmarking infrastructure. A new initiative to implement and test new parallel algorithms for Multicore OCaml is also underway.

OCaml

Ongoing

  • ocaml/ocaml#9082 Eventlog tracing system

    Eventlog is a proposal for a new tracing facility for OCaml runtime that provides metrics and counters, and uses the Binary Trace Format (CTF). The next step to get this merged is to incubate the tracing features in separate runtime variant, so it can be selected at application link time.

  • ocaml/ocaml#8984 Towards a new closure representation

    A new layout for closures has been proposed for traversal by the garbage collector without the use of a page table. This is very much useful for Multicore OCaml and for performance improvements. The PR is awaiting review from other developers, and can then be rebased against trunk for testing and merge.

  • ocaml-multicore/ocaml-multicore#187 Better Safe Points

    A patch to regularly poll for inter-domain interrupts to provide better safe points is actively being reviewed. This is to ensure that any pending interrupts are notified by the runtime system.

  • Work is underway on improving the marshaling (runtime/extern.c) in upstream OCaml to avoid using GC mark bits to represent visitedness, and to use a hash table (addrmap) implementation.

Completed

The following PRs have been merged to upstream OCaml trunk:

  • ocaml/ocaml#8713 Move C global variables to a dedicated structure

    This PR moves the C global variables to a “domain state” table. Every domain requires its own table of domain local variables, and hence this is required for Multicore runtime.

    This uncovered a number of compatability issues with the C header files, which were all included in the recent OCaml 4.10.0+beta2 release via the next item.

  • ocaml/ocaml#9253 Move back caml_* to thematic headers

    The caml_* definitions from runtime/caml/compatibility.h have been moved to provide a compatible API for OCaml versions 4.04 to 4.10. This change is also useful for Multicore domains that have their own state.

Multicore OCaml

The following PRs have been merged into the Multicore OCaml trees:

  • ocaml-multicore/ocaml-multicore#275
    Fix lazy behaviour for Multicore

    A caml_obj_forward_lazy() function is implemented to handle lazy values in Multicore Ocaml.

  • ocaml-multicore/ocaml-multicore#269
    Move from a global pools_to_rescan to a domain-local one

    During stress testing, a segmentation fault occurred when a pool was being rescanned while a domain was allocating in to it. The rescan has now been moved to the domain local, and hence this situation will not occur again.

  • ocaml-multicore/ocaml-multicore#268
    Fix for a few space leaks

    The space leaks that occurred during domain spawning and termination when performing the stress tests have been fixed in this PR.

  • ocaml-multicore/ocaml-multicore#272
    Fix for DWARF CFI for non-allocating external calls

    The entry to caml_classify_float_unboxed caused a corrupted backtrace, and a fix that clearly specifies the boundary between OCaml and C has been provided.

  • An effort to implement a synchronized minor garbage collector for Multicore OCaml is actively being researched and worked upon. Benchmarking for a work-sharing parallel stop-the-world branch against multicore trunk has been performed along with clearing technical debt, handling race conditions, and fixing segmentation faults. The C-API reversion changes have been tested and merged into the stop-the-world minor GC branch for Multicore OCaml.

Benchmarking

  • The Sandmark performance benchmarking infrastructure has been improved for backfilling data, tracking branches and naming benchmarks.

  • Numerical parallel benchmarks have been added to the Multicore compiler.

  • An Irmin macro benchmark has been included in Sandmark. A test for measuring Irmin’s merge capabilities with Git as its filesystem is being tested with different read and write rates.

  • Work is also underway to implement parallel algorithms for N-body, reverse-complement, k-nucleotide, binary-trees, fasta, fannkuch-redux, regex-redux, Game of Life, RayTracing, Barnes Hut, Count Graphs, SSSP and from the MultiMLton benchmarks to test on Multicore OCaml.

Documentation

  • A chapter on Parallel Programming in Multicore OCaml is being written and an early draft will be made available to the community for their feedback. It is based on Domains, with examples to implement array sums, Pi approximation, and trapezoidal rules for definite integrals.

Acronyms

  • API: Application Programming Interface
  • CTF: Common Trace Format
  • CFI: Call Frame Information
  • DWARF: Debugging With Attributed Record Formats
  • GC: Garbage Collector
  • PR: Pull Request
  • SSSP: Single Source Shortest Path
42 Likes

If I may ask a question, I am curious about the status of integration of effects into the type system. According to this page https://ocamlverse.github.io/content/future_ocaml.html#typed-algebraic-effects, original plan was to merge an untyped version of effect, before it was decided to integrate them into the system. I have seen this presentation of leo white on this matter https://www.janestreet.com/tech-talks/effective-programming/ along with this one https://www.youtube.com/watch?v=ibpUJmlEWi4 (from 2016). My understanding was that, at the time of the last presentation, there was still some theoretical issues to be solved (although the speaker did not seem too worried about finding some way around eventually). I have no idea about the current status of the project. Reading your post it seems that you are now in an integration phase (PR reviews and all) that would imply that you’re done with (most) theoretical questions. But that could either mean that you are integrating an untyped version of effects (and the type system is let for future development) or that you have indeed settled on a design. Which one is it ? Anyway, thanks for the post and the work in general, this project seems awesome (even if I did not dive into it too much until now)

5 Likes

Good question; our current focus in getting the runtime components upstreamed (the “Domains” API) and some of the mechanisms that could be used by an effect system. We haven’t yet settled on a final design for an effect extension to OCaml, but the general preference is to skip integrating an untyped effect system if a typed version lands in the right timescales. This will happen after all the runtime pieces are upstreamed, which will allow everyone to use multicore parallelism via the lower-level Domains API.

6 Likes

Thanks for the detailed update!

This reassures me that the multicore runtime is a matter of when rather than if, and that’s very welcome to hear.

Thinking about this, I wonder if transitional implementations of Domain and Atomic for trunk OCaml, written in terms of Thread, could be a useful extension for facilitating transition?

Will Unix.fork be still available in multicore, or will it be deprecated? Seems like it’s quite an effort to implement it in a safe and robust way, and many (most?) languages avoid it by introducing some sort of fork_and_exec.

It currently requires there be no more than one live domain. The existing create-process functions are implemented in terms of Unix.fork so they have this limitation as well, but I wrote a pending patch that reimplements them as external C functions. It’s not a perfect replacement though so it’s going to require some wrangling before it can be merged.

What is the compatibility story for programs that rely on fork? They’re supposed to do it before starting other domains?

I’m asking because I have code that redoes subprocess creation functions using fork+exec, because the standard ones are too limited (no way of getting all the info at once, and no way of implementing a timeout).

@c-cube this raises three questions:

  1. Would there a reason for others to use your code instead of the Spawn library?
  2. If yes, is your code packaged as a nice library that people can reuse?
  3. With your black belt in getting standard-library changes merged, have you considered submitting some of the low-hanging fruits to the upstream?
1 Like

It’s not packaged, and it’s a bit specific, sorry. Still, given that even lwt.unix seems to use fork, seems like a deprecation/alert would be better in the short term than a straight removal? Especially if one wants code that can run both on normal OCaml and multicore OCaml (avoid the python2/3 split).

As a casual external observer, this is really good news! I am very happy that the contributor team is transparent and gives status updates. As another poster said, it sends the clear message that Multicore OCaml is a matter of “when” and not a matter of “if”.

As a small random anecdata, I’m waiting for Multicore OCaml and will then absolutely positively start preaching it wherever I go and do the occasional consulting. (One data point I’d present is how much faster the OCaml compiler is compared to Rust’s.)

To the core team: you are doing great work and might very well be shaping the future programming and IT in general. Never forget that you are important and your work is appreciated! :vulcan_salute:

7 Likes

Don’t wait, Dimitar. OCaml is Ready. Spread that word now :wink:

Well, some of us are very spoiled by the Erlang runtime. Guilty!

Still though, anything that can lead to parallel iterators that multiplex work on all CPU cores will be very welcomed! And if you guys actually make actors with message inboxes, yeah, I am ditching Elixir the next day. :smiley:

3 Likes

To clarify on your point in particular, to me a single-threaded runtime is more suited for scripting and quick one-off tasks. OCaml brings a huge value there as opposed to something like e.g. Python because OCaml has static typing.

Another example to complement my earlier comments: I am working on finding and importing small-to-medium-sized datasets (up to 200GB) and writing code in several languages to parse the data and import into Postgres and sqlite3. Parallelism for this recent hobby of mine is paramount (not for sqlite3 though, it can’t write in parallel; I should consider making the sqlite3 versions in OCaml in addition to the other languages).

I disagree (and so do every serious server written in Node, Ruby, Python, etc.) but I know this is your personal opinion so not really looking to argue about it.

In this ‘embarrassingly parallel’ example, like in many others, you actually don’t need multicore Ocaml. You can quite easily spin up multiple processes and do the data parsing.

4 Likes

Given the fundamental incompatibility of fork and multiple threads, this is the cleanest solution. fork with multiple running domains will raise an exception. If there are alternate suggestions, please do let me know as we haven’t frozen this decision yet.