A tutorial on parallel programming in OCaml 5

I ran a hands-on tutorial on the new parallel programming primitives in the upcoming OCaml 5 at the Tarides off-site last week. It covers the low-level parallelism primitives exposed by the OCaml 5 compiler as well as high-level parallel programming using domainslib. I hope you like it and find it useful. Please feel free to open issues if you find anything amiss.

43 Likes

As it is not immediately clear for me, does it uses threads , green threads, processes , fibers ? And who is responsible for the scheduling ,the Ocaml application or the underlying operating system ?

1 Like

Each domain corresponds to one system thread. The scheduling between them is therefore performed by the operating system.

The tutorial only covers domains, which are the way to perform parallelism in OCaml 5. To use concurrency (e.g. having several IO-depending operations that run concurrently on the same core), the main mechanism is effects (which at the level of the runtime system, are implemented using small stack segments called fibers), as in the eio library. Effects allow such libraries to provide a form a lightweight threads (aka green threads) whose scheduling is implemented in the OCaml application using effect mechanisms.

6 Likes

Thank you for spending the time doing this. One query.

You say “Whenever a domain exhausts its minor heap arena, it calls for a stop-the-world, parallel minor GC, where all the domains collect their minor heaps.” If a domain collecting its minor heap includes moving entities to the major heap, and if one domain can trigger such a minor heap collection of another domain, how do the CAMLparam macros now work correctly? In theory there could be a period of vulnerability between a function beginning and setting up its stack frame and its value arguments, and the macro being applied to those arguments, caused by alloctions of an entirely different domain.

Indeed if every domain has its own minor heap, what’s the point of one domain’s minor heap collection triggering another domain’s collection?

Quoted directly from the tutorial

Domains are heavy-weight entities. Each domain directly maps to an operating system thread.

The rest of the answers you seek are also there in the tutorial. The concepts are introduced in a piecemeal fashion. I encourage you to have a read.

4 Likes

For the CAMLparam macros, the discipline is the same as OCaml 4 – the user needs to ensure that allocations functions (caml_alloc*) are not called before the parameters are registered.

For the stop-the-world sections, OCaml 5 doesn’t stop a domain execution arbitrarily at any point (by interrupting the threads with signals, for example). Whenever a domain triggers a stop-the-world section, the other domains will participate in the stop-the-world barrier at their next allocation point. This ensures that the discipline followed for the correct use of CAMLparam macros in OCaml 4 is sufficient in OCaml 5.

Indeed if every domain has its own minor heap, what’s the point of one domain’s minor heap collection triggering another domain’s collection?

There could be cross minor-heap pointers. We cannot independently collect the minor heap of domains without handling this case.

OK I see, thanks. It would be good if the equivalent of chapter 20 of the ocaml manual for ocaml-5 when it comes out is were to make some comment along those lines, and I imagine it will do so.

This is an impressive body of work.

Here is a very simple tutorial on parallel programming in OCaml: use parany !

For OCaml 5, use the right branch of parany:

Happy hacking!
F.

3 Likes

Thanks for the tutorial, it is very useful!

I was wondering however if it is possible to kill a Domain.t, I have an application where I launch a few speculative Domains, but once I learn new information I’d like to kill that domain.

Will that be possible in OCaml 5?

To my understanding normally you wouldn’t want to have a dynamic number of domains. You would have a static number of domains, started at application startup. Then you would run a dynamic number of fibres on the domains via some scheduling mechanism like Eio. The fibres can be killed e.g. you can do speculative work in the fibre and then kill it. More on cancellation: GitHub - ocaml-multicore/eio: Effects-based direct-style IO for multicore OCaml

Domain.kill and similar asynchronous exceptions are hard to use correctly in the presence of resources. Hence, we do not provide or plan to provide such a mechanism for domains. I would suggest organising your code such that the speculative domains poll for the termination condition and voluntarily terminate.

Asynchronous exceptions and resources are known to be hard to get right. For example, the domain may hold a lock and get killed. There are various non-backwards-compatible asynchronous exception-safe coding practices (such as brackets). But it is still quite tricky. If this topic is of interest, I would recommend reading Asynchronous Exceptions in Haskell, which touches upon these issues.

I do not think it is true that adding asynchronous exception-safety features would be a problem for backwards-compatibility. Current programs not using asynchronous exceptions would continue running just fine because they would not use asynchronous exceptions, and in addition only the part of the program that needs to be interrupted needs to be made exception-safe. You might be thinking about making arbitrary programs ready for asynchronous exceptions, which of course is not realistic and is not a question that we are asking in general.

There are already lots of programs using asynchronous exceptions in OCaml, and it is sometimes not realistic to rewrite them to poll explicitly (as @ejgallego knows well). Once Memprof is available for OCaml 5, my library memprof-limits will provide ways and guidance to interrupt a domain on an arbitrary trigger, including in the presence of resources that need cleaning-up.

The Asynchronous Exceptions in Haskell paper is a nice work but I would not recommend it as a first read nor as a sole reference. There is a non-negligible delta between this paper and the practice of asynchronous exceptions in Haskell, and it omits the experience of other languages that have introduced key exception-safety concepts. I gave a bibliography of important resources on this topic at the end of the guide to program with interrupts and during my presentation at the Ocaml workshop 2021 (video).

2 Likes

Thanks. Looking forward to playing around with the memprof-limits library when it becomes available for OCaml 5. I’d be happy to be proven wrong.

Thanks for the all the feedback @kayceesrk , unfortunately this means that OCaml 5.0 will still be unfit for a large variety of uses, mostly related to applications that would like to use domains to perform search, where it is indeed impractical to have for example a thread in a chess search engine for example to do poll, that assuming the developers of such existing code would agree to retrofitting that.

For example, for Coq, it is just either very costly or very brittle to add such checking points.

I will open an issue upstream to continue the discussion.

It seems clear to me that supporting pthread_kill is not easy in the context of OCaml, it is known how hard it was for Isabelle to get their Thread.kill right, however as of today it works fine, so Isabelle can just indeed submit its proof checking tasks to its thread pool and cancel them happily if the document has changed. Lean and a few other systems have managed to implement true thread cancellation fine.

So without being an expert here, I’d be good to know more what exact program is OCaml facing here that virtually no other system with multithreading has as of today.

@kayceesrk The “arbitrary trigger” part is already there, this example is the one that will interest you. It is only missing an implementation of Memprof in multicore OCaml to be applicable to the interruption of multicore domains.

@ejgallego In this discussion we should distinguish one particular functionality that would apply to all domains (e.g. Domain.kill) from the ability to interrupt one domain if it is ready for it (which could be provided by other means). It sounds reasonable to me to not want a general Domain.kill. As for pthread_kill I doubt anything meaningful can be made out of it, this is not the same as raising an asynchronous exception that will clean up all resources on the way up. (I mistook pthread_kill for pthread_cancel—the former simply sends a signal to a specific thread, which can then be used to raise an asynchronous exception in an chosen domain.)

1 Like

@ejgallego In this discussion we should distinguish one particular functionality that would apply to all domains (e.g. Domain.kill ) from the ability to interrupt one domain if it is ready for it (which could be provided by other means). It sounds reasonable to me to not want a general Domain.kill . As for pthread_kill I doubt anything meaningful can be made out of it, this is not the same as raising an asynchronous exception that will clean up all resources on the way up.

Indeed, the functionality I need is basically to be able to terminate / interrupt a potentially uncooperative thread, for example because a tactic went on an infinite loop. That’s basic functionality in most multithreaded applications, such as games, search engines, SMT solvers, ML Reinforcement Agents, etc… I am not in the case of event-based IO, which is handled fine by many of the existing solutions in OCaml.

I should also note that I’m fine with having to face the risk / burden or resource cleanup, in particular I’m fine to use for example pthread_cancel and manage the interruptibility status manually.

in particular I’m fine to use for example pthread_cancel and manage the interruptibility status manually.

Now this will not work because there is too much state to keep track of in the OCaml runtime to possibly set (pthread-)cancellation clean-up handlers correctly. You would end-up with a runtime in a corrupt state. Let’s stay on the idea of using asynchronous exceptions :slight_smile:

1 Like

Yes, async exceptions seems the way to go; I suggest we continue the discussion at the OCaml bug tracker; c.f. Support for `pthread_cancel / Domains.cancel` in OCaml 5.0 · Issue #11411 · ocaml/ocaml · GitHub