Benchmarking bucklescript vs. js_of_ocaml

cemerick · January 16, 2019, 3:04am

I have been tinkering with OCaml, bucklescript, and js_of_ocaml. I have a microbenchmark that I’ve been implementing for various languages over the last month or so as I sample them; implementing it and running the results through BS and JSOO yielded surprising (at least for me) results:

=== bucklescript (reason) ===
9.23s

=== js_of_ocaml ===
57.49s

I have some understanding of each compiler’s respective design constraints, so I was expecting that ordering, but not a margin that large. All of the source and compiler invocations, etc are available in https://github.com/cemerick/silly-shootout, so if anyone sees any place where I’m making an egregious error, I’d appreciate that feedback.

Thanks very much

— Chas

tiensonqin · January 16, 2019, 11:37am

It might be polymorphic compare, In Bucklescript min and max will compile to type-specific functions even javascript primitive functions.

cemerick · January 16, 2019, 4:52pm

Good suggestion! Using local implementations of min and max that call out to js directly:

  module Unsafe = Js_of_ocaml.Js.Unsafe;;
  let jsmin a b =
    Unsafe.fun_call (Unsafe.js_expr "Math.min") [|Unsafe.inject a; Unsafe.inject b|]
  let jsmax a b =
    Unsafe.fun_call (Unsafe.js_expr "Math.max") [|Unsafe.inject a; Unsafe.inject b|]

cuts the JSOO-emitted code’s runtime from 58s to 44s. A good improvement, but obviously there’s more going on.

cemerick · January 16, 2019, 4:54pm

BTW, regarding BuckleScript: it does provide it’s own min and max impls, but they’re no slower than calling out to Math.min/max. It turns out that its runtime representation of records is responsible for the entirety of the margin between its generated code and the “native” JS implementation I have in that repo. Details here: https://github.com/BuckleScript/bucklescript/issues/2922#issuecomment-454683001

copy · January 16, 2019, 5:01pm

I played around with the generated JS code a little bit and found that optimising the ref into a variable also makes a big difference.

1257,1258c1257
<         A = 0,
<         B = J[1];
---
>         A = 0;
1260a1260
>             B = J[1],
1267c1267
<         B = [au, V(aP, 20), aS, aR, aQ];
---
>         J[1] = [au, V(aP, 20), aS, aR, aQ];
1275d1274
<         J[1] = B;

Timings on my machine:

Initial: 45.408s
With non-polymorphic min/max: 34.080s
Optimised ref: 33.742s
Both: 7.317s

And, for reference, bucklescript: 6.885s

And lastly, a functional version:

let rec f i rect =
  let rect = union rect (r 20. 0. 100. (float_of_int i)) in
  if i = lim then rect
  else f (i+1) rect

Functional with non-polymorphic min/max: 7.208s

cemerick · January 16, 2019, 5:22pm

Ech, of course this is just fine for a non-polymorphic compare, and yields the same results as the “unsafe” JS calls.

let min (a: float) (b: float) : float = if a < b then a else b
let max (a: float) (b: float) : float = if a > b then a else b

cemerick · January 16, 2019, 5:25pm

You’re right, cutting out the ref (via the recursive reduction) + monomorphic min/max gets very close to the bucklescript level, down to 12.5s on my machine. Good catch!

cemerick · January 16, 2019, 5:46pm

Eliminating the ref is an obvious improvement, but having to dodge mutability to get solid performance isn’t great. (A great deal of the codebase I’m looking to port is effectively pure, but it does sit on top of a core graph data structure that uses mutation heavily, especially when being initially loaded.)

I thought maybe a mutable “box” record would get the same job done, but it actually had exactly the same performance as the ref:

type 't box = {mutable v: 't}
let _ = let rect = {v = r 25. 25. 200. 200.} in
        for i = 0 to lim do
            rect.v <- union rect.v @@ r 20. 0. 100. (float_of_int i)
        done;
        print_endline @@ string_of_float @@ rect.v.ty

Are mutable record fields actually refs underneath?

octachron · January 16, 2019, 5:53pm

The reverse is true: references are single mutable field records: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Pervasives.html#1_References

cemerick · January 16, 2019, 6:45pm

Ah, appreciate the pointer. Somehow I was under the impression that refs and their operators were language primitives instead of part of the stdlib.

So with monomorphic min/max and the immutable recursive reduction, things are down to:

=== bucklescript (reason) ===
8.87s

=== js_of_ocaml ===
12.53s

That’s a big improvement for JSOO, but the tradeoff is unfortunate. Polymorphic compare can be worked around; the representation of mutable fields, not so much. I’ve thus pushed an update to the benchmark that includes the former, but sticks with the ref.

Maybe Unsafe could be used to eliminate that overhead for the hottest spots…

lpw25 · January 16, 2019, 7:11pm

Have you tried turning off -g for the js_of_ocaml case? IIRC -g turns off ref optimisation in bytecode and since js_of_ocaml works off the bytecode that’s going to have a negative affect. Since Bucklescript uses a modified compiler it might avoid that issue.

copy · January 16, 2019, 7:12pm

Note that this optimisation only applies to refs (records with a single field). For the rather silly redefinition of ref below, jsoo again has a similar performance profile to bucklescript.

type 'a ref = { mutable contents: 'a; bloat: unit }
let ref x = { contents = x; bloat = () }
let (!) { contents; _ } = contents
let (:=) r x = r.contents <- x

cemerick · January 16, 2019, 8:23pm

I’m not using a build tool, so I can know/control exactly what’s being done. I’m not passing -g, so I presume debugging isn’t enabled (I don’t see any -no-g or similar, so I assume that’s the default).

I’m compiling using this invocation:

ocamlc bench.ml && js_of_ocaml --opt=3 a.out

Do let me know if I should be using something different.

cemerick · January 16, 2019, 8:29pm

To clarify, you’re seeing this redefinition of ref et al. yield a significant improvement in JSOO output’s performance? I’ve not been able to replicate so far.

lpw25 · January 16, 2019, 8:58pm

I found the issue: the uses of @@ are preventing the optimisation. If you remove them then things should improve.

copy · January 16, 2019, 9:28pm

No, I was just trying to show that jsoo and bucklescript have a comparable performance when it comes to records with more than one field.

Yaron_Minsky · January 16, 2019, 10:00pm

Ugh. Why does @@ lead to de-optimization?

cemerick · January 16, 2019, 10:06pm

Thank you! I’ve pushed a change based on this suggestion, and the timings are now quite tight:

=== bucklescript (reason) ===
9.98s

=== js_of_ocaml ===
12.90s

The change also took ~20% off the runtime of the ocamlopt.opt output.

A few interesting things determined experimentally:

Either composition operator (@@ or |>) can trigger the deoptimization.
The only usage that was impactful was where I had used @@ in declaring the binding of the ref. Using @@ or |> anywhere else had no negative consequences.

All of this is very surprising, at least to me?:

In the languages I’ve used that have operators like this (e.g. $ and co. in haskell, -> and co. in Clojure et al., etc), they are fundamentally syntactic affordances. Clearly something more is going on here.
In cases where all of the arguments to a function are provided (like here, e.g. let foo = ref @@ expressions...), shouldn’t composition operators simply vanish?
To that point, a post coinciding with the 4.0 release talked about implementing exactly that optimization, but apparently that is no longer the case?

lpw25 · January 16, 2019, 10:07pm

The ref optimisation is very simple and syntactic. It would need to either detect @@ as a special case or be done after inlining to catch this case. In flambda it is done after inlining so it should work fine in this case, but of course you can’t use flambda with bytecode/js_of_ocaml.

cemerick · January 16, 2019, 10:10pm

FWIW, I am using +fp+flambda, and the ocamlopt.opt timings were impacted by the composition operators, 5.15s vs. 6.32s. ¯\(ツ)/¯

Topic		Replies	Views
Blog post: Js_of_ocaml, a bundle size study Learning blog , performance , js_of_ocaml	16	3419	April 23, 2022
Is it practical to mix BuckleScript and Js_of_ocaml in a single project Ecosystem	4	914	February 18, 2020
Execution speed of jsoo/rescript output? Learning	3	553	March 5, 2023
JS_of_OCaml vs Bucklescript Ecosystem bucklescript , code-generation , js_of_ocaml , javascript	32	11129	October 31, 2018
Js_of_ocaml bindings for ReactJS Ecosystem ocsigen , reason	18	3908	November 4, 2017

Benchmarking bucklescript vs. js_of_ocaml

Related topics