JSOO/ReScript 'primes' benchmark

Continuing the discussion from A short history of ReScript (BuckleScript): thought I’d start a new thread instead of continuing there since it was getting fairly OT.

You have a nice fast machine. By way of comparison, how long does the native ocamlopt code take to solve the prime?

Thanks :slight_smile: It’s a little indulgence for trying out different tech stacks. The native and bytecode benchmarks:

$ ./primes.opt
Elapsed time: 0.142687
1299709
$ ./primes.byte
Elapsed time: 0.475783
1299709

Loading it in the head seems to improve the chance of the JITer getting on the fast path, particularly if reloading the page.

Interesting. I made the change and get the following:

  • Firefox: 0.105s (before: 0.117s, seems like a speedup)
  • Chromium: 0.095s (before: 0.094s, doesn’t seem like a significant difference)

if I tweak your rescript-generated primes.js file by omitting the imports of curry.js and caml_int32.js and substituting standard javascript, the rescript version runs slightly faster than the JSOO version in chrome and slightly slower in firefox.

Yeah, the ‘curry’ stuff is because the compiler has to check whether the return value is a curried function or if it’s fully applied. There’s actually support for totally uncurried functions, with [@bs] annotation in OCaml syntax and dedicated ergonomic syntax in ReScript. I purposely didn’t use that though because I typically wouldn’t use it unless performance was actually a concern.

4 Likes

The consistent thing that emerges is that the emitted javascript of both rescript and JSOO is, with the tight loop optimizations of modern browser JIT implementations, faster than native ocaml for this workload. I don’t think that would often happen in more normal workloads but it is still quite impressive.

By contrast with my 0.29 secs for chrome and 0.26 secs for firefox, the same code rewritten in common lisp using sbcl (I think about the fastest lisp) scores at 1.05 seconds, in scheme chez-scheme and gambit (I think about the fastest schemes) come in at 1.13 seconds and 1.04 seconds respectively, and python3 is miserable at 5.7 seconds.

I guess this speaks to the quality of javascript’s current JITing engines.

Edit: I have become sufficiently intrigued that I have also just rewritten this for C++. This algorithm comes in on C++ at 0.21 seconds. So for this algorithm JSOO/rescript are nearly as fast as C++, and native ocaml is just under 2 and a half time slower, which isn’t bad at all.

4 Likes

Some more context would help people follow this conversation:

  • Could one person list all relevant timings on their machine for the benchmark ine one place? Currently the timings are speared across the two threads.
  • Could someone post a URL to a browsable version of the benchmark, not an archive uploaded somewhere? This would be useful for people who want to know what sort of code we are talking about, but not necessarily play with it locally.
3 Likes

True, sorry :slight_smile: Let me provide some more context. First of all, this is a small ‘benchmark’ that @cvine and I are playing around with to mainly check how JavaScript engines handle the OCaml-to-JS compilers’ output JS.

I’ve tried preparing a JSBench benchmark, but the JSOO output JS size is actually too large to save there–I get an error when I try to save it.

However, I have tried to do a slightly more formal comparison and prepared a new repo: GitHub - yawaramin/primes-ocaml-benchmarks: Benchmark of prime number calculation in OCaml compilers

I believe this is a bit better in terms of methodology–it also includes variants of his original native, bytecode, and JSOO benchmarks–but I’m open to suggestions.

1 Like

You are doing something wrong. JSOO and rescript are not over 600x slower than native ocaml for this algorithm. For this algorithm they are (I agree surprisingly) somewhat faster than native.

I really don’t think that this particular benchmark is worth the effort. It started as a matter of interest. Let’s not take it out of proportion.

Yeah, I may well have messed up the measurement somehow. I agree, it’s not very meaningful by itself–micro-benchmarks like this are nearly always worse than actually measuring performance of a realistic application. The only meaningful result from here might be the bundle size benefit especially for slower mobile networks.

Your native and bytecode ocaml measurement is out by a factor of 1000. Sys.time evaluates to seconds.

1 Like

Ah, right you are. That was silly of me :slight_smile: Fixed now:

  • Native: 1.57x
  • Bytecode: 5.34x
  • JSOO: 1.11x
  • ReScript: 1x

That looks about right, and it’s interesting. As a general rule of thumb, in my opinion ocaml native is normally going to beat out JSOO and rescript reasonably comfortably, but I cannot provide data to back that up. But if your program is bound by tight loops, as in some numerical calculations such as this test case, the very high quality of today’s javascript implementations, and in particular their optimized JIT compilation of loops, means that rescript/JSOO may beat you out.

1 Like

I’m no Jsoo expert, but I notice you are compiling with lwt packages and ppx packages, but doesn’t look like you’re using any of those in your Ocaml code. Are these necessary dependencies to compile the example?

Ah, good point, that was from Chris’ original version of the code which was using the DOM onload event. I’ve removed it now, but it doesn’t really make a significant difference.

1 Like

That makes sense! Iiic, the dead code elimination should have cleaned it up any how, but nice to simplify where possible any how :slight_smile:

I would guess that the native-code performance (relative to Js) is harmed by this operation:

    let lim = truncate ((sqrt (float_of_int num)) +. 0.5) in

sqrt is not a primitive, it goes through a C call, whereas I would guess that Javascript engines optimize this better (especially as truncate and float_of_int are no-ops in Javascript land).

But you don’t need this to compute primes: instead of

    let lim = truncate ((sqrt (float_of_int num)) +. 0.5) in
    let rec lp curr =
      if curr > lim then

just use:

    let rec lp curr =
      if curr * curr > num then

My prediction would be that naive would look much better, relative to other benchmarks, with this new version. (I still expect Js engines to do pretty well as those are just arithmetic-computation loops.)

2 Likes

I got that your variant is 8% slower than the previous one: fastware/primes/primes.ml at master · Kakadu/fastware · GitHub

It does look better, relatively, on my original code (I haven’t run yawaramin’s version). Of course, calculating the limit (the integer root) out of the loop amortizes the cost over the whole of the time in the loop. The alternative of squaring each tested number has to be carried out on each iteration of the loop. On my own code, squaring on each iteration instead of taking the root slows all of the results (by 19% for the JSOO version, 7% for ocaml bytecode and 1% for ocaml native) so there is a relative closing of the gap but the JSOO version is still ahead.

1 Like