Multicore Ocaml vs Thread

This is not a silly question at all. Allow me to refine your definitions a little to get them more accurate.

Concurrency is how we partition multiple computations such that they can run in overlapping time periods rather than strictly sequentially. OCaml already has excellent support for concurrency in a single heap – for example explicitly via user libraries such as Lwt or Async. The venerable OCaml Thread module is also a fine way to express concurrent computations, as each of them run with their own thread of control flow.

Parallelism is the act of running concurrent computations simultaneously, primarily by using multiple cores on a multicore machine.

You can get a certain amount of speedup using just concurrent programming, for example when doing I/O intensive operations so that your program is not waiting around for one connection to receive data while other connections might be able to process other data. This is the primary usecase of both Lwt and Async. When you don’t want to pepper your program logic with monads, then the Thread module accomplishes a similar task (Dune does this, for example).

However, these all still run on a single core until you unlock parallelism in the runtime. The Thread module takes a lock on the runtime when an OCaml thread is scheduled to prevent any others from running at the same time. However, C threads can continue to run in parallel just fine – it’s only the OCaml runtime that is locked against parallel executions.

Thus were born many of the solutions listed above. In order to obtain parallelism, they run multiple OCaml runtimes, for example in different processes, and use communications channels to share data between them. Data structures have be marshalled across processes or otherwise carefully shared using IPC primitives, but it works pretty well for many usecases. The key drawback is that data structures cannot be shared seamlessly across runtimes, so you need to do some extra work in your application.

Which leads onto multicore OCaml: you can use Domains as units of parallelism, and explicitly manipulate shared data structures in a single heap using multiple processors simultaneously. The current push for multicore OCaml does not advance the state of concurrency for the first chunk of upstreaming, but provides the runtime primitives (mostly via the Domains module) to start parallel flows of control. Improving the state of concurrency is an entirely separate development effort that we’re working on too.

20 Likes