Parallel performance of an OCaml program: ocaml-5.0.0 Vs 4.13.1

Perhaps the important bits are these.

Concurrent minor GC, which permits minor heaps of domains to be independently collected, requires a read barrier.

From section 1.2,

While this allows users to write fast code, the API also brakes in the invariants of the GC. For example, reading any OCaml object, be it mutable or not, using the C API does not involve a read barrier, and is compiled as a plain read of memory. A new GC scheme that adds a read barrier only to reads of mutable fields will need to deprecate the old API or suffer the risk of breaking code silently. Either way, the users will have to modify their code to work correctly under the new GC. Given that the compiler does not check incorrect uses of the C API, it is already difficult to write correct and efficient FFI code. We would like to strike a balance between the added complexity of the C API and the performance impact of the missed opportunities.

Essentially, the Field macro needs to include a read barrier and is also, critically, a GC safe point where the GC can move objects. This is not the case in sequential OCaml. As a result, Field(v,i) cannot be an l-value either. The users will not only need to change their code but also require a careful audit of all the C FFI use in order to ensure that the assumptions about when GCs are run aren’t broken. It is possible that we can design a clever static analysis tool on the C source code to analyse them easy safety cases. But this is an open research question.

OTOH, we experimentally validated that the stop-the-world parallel minor GC performs as well as (and in some cases better than) the concurrent minor GC (see results in Fig 11 and Fig 12). So our initial fears that stop-the-world GC would fare quite poorly against the concurrent collector were not well founded. Of course, the caveat is that we are not testing under over-committed environments.

I should say that the doors are not shut for innovation here. There are a number of things that we can try to improve the performance. Many of the criticisms found in the discuss forum are valid. Given that the problems are challenging and the resources are limited, we will need to prioritise and work on these.

What we’ve done in OCaml 5.0 is to ensure that we haven’t made any choices about the design that changes the semantics – no breakages. This allows the community to move to multicore easily, and take advantage of parallelism with well-understood limitations. We can now start looking at more interesting designs that might potentially break APIs if the performance improvements are worth it. If we had broken the APIs and had limitations, I think our uses would have been more cross than they are now :slight_smile:

11 Likes