Significant performance difference between OCaml and F#

UnixJunkie · July 15, 2021, 12:34am

This is bad news for the concurrent runtime then.
And yes, parallel programs that scale well try to reduce inter-process communications to the bare minimum.

UnixJunkie · July 15, 2021, 12:39am

There are gmp bindings for OCaml I guess. You are not forced to use zarith.

UnixJunkie · July 15, 2021, 12:44am

How to you measure cache misses, and how do you know which part of the code is responsible for them?

c-cube · July 15, 2021, 2:27am

It is not bad news, a lot of workloads are not trivially
parallelisable. If parany is enough for you, you’re just lucky

silene · July 15, 2021, 5:19am

Sure. As explained in a latter comment of this thread, pidigits switched from zarith to mlgmpidl more than one year ago. The OCaml code is now much less idiomatic, but on par with the other languages performance-wise.

There are lots of tools. My preferred ones are cachegrind and perf. On one hand, cachegrind is extremely precise, but it makes the tested programs unbearably slow. On the other hand, perf does not impact the speed of the tested programs, but locations of cache misses are a lot less precise.

gasche · July 17, 2021, 6:48am

@gndl’s parallel Re version was just accepted by Isaac Gouy, regex-redux OCaml #3. On the regex-redux page, the previous best OCaml solution was 19x slower than the best solution overall, the new best OCaml solution is now only 3.1x slower. This is impressive given that the new solution is using a pure-OCaml regex matching library.

Thanks @gndl!

XVilka · July 19, 2021, 5:10am

Looks like the only big difference left is k-nucleotide:

source	secs	mem	gz	busy	cpu load
F# .NET	3.65	184,680	1907	12.58	94% 84% 85% 83%
OCaml	15.09	255,396	1833	40.72	38% 94% 64% 74%

mmottl · July 22, 2021, 5:40pm

I’ve just added JIT-compilation support to pcre-ocaml (should be in OPAM soon). JIT-compiled patterns will typically see substantial speedups. Though this is also the case for the benchmark here, the ocaml-re version is still somewhat faster. That’s actually quite impressive if you think about it, but may not be the case for other benchmarks, of course.

In any case, users of pcre-ocaml are highly encouraged to try out JIT-compilation by adding the ~jit_compile:true flag when creating regular expressions.

mmottl · July 22, 2021, 5:48pm

Thanks, it’s certainly a great speedup! I’ve noticed that there is only one process performing the variant counting. I haven’t looked at the machine specs, but it may be substantially faster to match each variant in a separate process. Of course, multicore will eventually make all of this much easier.

gndl · July 28, 2021, 10:20pm

Thank you very much @mmottl for your responsiveness.
By parallelizing the variants counting with forks, I obtain the following results :

Impl	sequential	parallel	parallel count
Pcre	15	10	7.3
Pcre JIT	6.1	4.4	4.3
Re	3.8	2.3	2.25

The performance gain obtained with JIT-compilation is impressive!
All in all, the benchmarksgame can prove to be beneficial

On the other hand, an unfortunate consequence of this bench would be to suggest that Re is better than Pcre in all circumstances. As the last line of the Re README says, Re is only beneficial if regexp are applied a large number of times, which is the case here but not necessarily representative of the majority of regexp use cases.

The gain obtained thanks to the parallelization of the variants counting is low because the execution time of the replace task is close to that of the variants counting executed sequentially, therefore the duration of the program is approximately that of the replace task.

In the case of Re, the replace_string function uses the Buffer module to concatenate the substrings. A Rope structure might be more efficient for this operation.

hyphenrf · July 7, 2022, 10:52am

Sorry for the necro but this thread is probably the most appropriate and most recently discussed place for me to point out that the reverse complement submissions and k-nucleotide ones are all failing to build because of API deprecations.

if anyone’s interested in updating them : P

gadmm · July 7, 2022, 12:24pm

How about not breaking programs in the first place, but instead finding other ways of discouraging the future use of words that are deemed archaic by some but otherwise work perfectly fine?

For instance moving them at the end of the interface file, perhaps hiding them from the documentation using the special “stop” ((**/**)) separator, if the deprecation warning is deemed not enough.

dra27 · July 7, 2022, 6:32pm

I agree - this is something I wanted to address in OCaml 4.x (and have outline plans and a previous core dev presentation for) and having “completed” a large number of deprecations in OCaml 5.0, intend to ensure we use for any future new deprecations.

I don’t think this would help here. The problem we’ve had with deprecations is that functions have been deprecated without a clear story being put in place at the release which introduced the deprecation for what one is supposed to do when writing code which supports both patterns so, for example, the day OCaml 4.03.0 was released, there was not a clear and simple path for how to use the _ascii functions and still support OCaml 4.02 and earlier and hence, 12 releases later when the old functions were removed, many current libraries still used them, or had hand-rolled and consequently broken shims.

gadmm · July 9, 2022, 8:40pm

Are the plans or the presentation public?

Topic		Replies	Views
OCaml benchmarks on different processor architectures? Ecosystem performance , benchmark	0	713	November 12, 2021
Ocaml Bytecode performance? Learning	21	2487	October 11, 2023
Looking at OCaml in the benchmarks game over the past few months Learning	20	4415	April 21, 2020
OCaml 5 performance Ecosystem multicore , performance , profiling , eio	30	2933	September 11, 2024
Taking Inventory of the OCaml Ecosystem on OCaml.org Ecosystem user-feedback , ocamlorg	14	1057	June 2, 2023

Significant performance difference between OCaml and F#

Related topics