Parallel performance of an OCaml program: ocaml-5.0.0 Vs 4.13.1

kayceesrk · January 31, 2023, 4:07am

Perhaps the important bits are these.

Concurrent minor GC, which permits minor heaps of domains to be independently collected, requires a read barrier.

From section 1.2,

While this allows users to write fast code, the API also brakes in the invariants of the GC. For example, reading any OCaml object, be it mutable or not, using the C API does not involve a read barrier, and is compiled as a plain read of memory. A new GC scheme that adds a read barrier only to reads of mutable fields will need to deprecate the old API or suffer the risk of breaking code silently. Either way, the users will have to modify their code to work correctly under the new GC. Given that the compiler does not check incorrect uses of the C API, it is already difficult to write correct and efficient FFI code. We would like to strike a balance between the added complexity of the C API and the performance impact of the missed opportunities.

Essentially, the Field macro needs to include a read barrier and is also, critically, a GC safe point where the GC can move objects. This is not the case in sequential OCaml. As a result, Field(v,i) cannot be an l-value either. The users will not only need to change their code but also require a careful audit of all the C FFI use in order to ensure that the assumptions about when GCs are run aren’t broken. It is possible that we can design a clever static analysis tool on the C source code to analyse them easy safety cases. But this is an open research question.

OTOH, we experimentally validated that the stop-the-world parallel minor GC performs as well as (and in some cases better than) the concurrent minor GC (see results in Fig 11 and Fig 12). So our initial fears that stop-the-world GC would fare quite poorly against the concurrent collector were not well founded. Of course, the caveat is that we are not testing under over-committed environments.

I should say that the doors are not shut for innovation here. There are a number of things that we can try to improve the performance. Many of the criticisms found in the discuss forum are valid. Given that the problems are challenging and the resources are limited, we will need to prioritise and work on these.

What we’ve done in OCaml 5.0 is to ensure that we haven’t made any choices about the design that changes the semantics – no breakages. This allows the community to move to multicore easily, and take advantage of parallelism with well-understood limitations. We can now start looking at more interesting designs that might potentially break APIs if the performance improvements are worth it. If we had broken the APIs and had limitations, I think our uses would have been more cross than they are now

Topic		Replies	Views
Parany for multicore OCaml Community multicore , parallel-programming	2	911	September 16, 2021
OCaml 5 performance Ecosystem multicore , performance , profiling , eio	30	3069	September 11, 2024
A tutorial on parallel programming in OCaml 5 Learning multicore	34	7593	December 21, 2022
Language abstractions and scheduling techniques for efficient execution of parallel algorithms on multicore hardware Community multicore	19	3686	October 15, 2021
Multicore OCaml: March 2021 Community multicore , multicore-monthly	10	5871	June 14, 2021

Parallel performance of an OCaml program: ocaml-5.0.0 Vs 4.13.1

Related topics