Benchmarking bucklescript vs. js_of_ocaml

I have been tinkering with OCaml, bucklescript, and js_of_ocaml. I have a microbenchmark that I’ve been implementing for various languages over the last month or so as I sample them; implementing it and running the results through BS and JSOO yielded surprising (at least for me) results:

=== bucklescript (reason) ===
9.23s

=== js_of_ocaml ===
57.49s

I have some understanding of each compiler’s respective design constraints, so I was expecting that ordering, but not a margin that large. All of the source and compiler invocations, etc are available in https://github.com/cemerick/silly-shootout, so if anyone sees any place where I’m making an egregious error, I’d appreciate that feedback.

Thanks very much :smiley:

— Chas

5 Likes

It might be polymorphic compare, In Bucklescript min and max will compile to type-specific functions even javascript primitive functions.

Good suggestion! Using local implementations of min and max that call out to js directly:

  module Unsafe = Js_of_ocaml.Js.Unsafe;;
  let jsmin a b =
    Unsafe.fun_call (Unsafe.js_expr "Math.min") [|Unsafe.inject a; Unsafe.inject b|]
  let jsmax a b =
    Unsafe.fun_call (Unsafe.js_expr "Math.max") [|Unsafe.inject a; Unsafe.inject b|]

cuts the JSOO-emitted code’s runtime from 58s to 44s. A good improvement, but obviously there’s more going on.

BTW, regarding BuckleScript: it does provide it’s own min and max impls, but they’re no slower than calling out to Math.min/max. It turns out that its runtime representation of records is responsible for the entirety of the margin between its generated code and the “native” JS implementation I have in that repo. Details here: https://github.com/BuckleScript/bucklescript/issues/2922#issuecomment-454683001

1 Like

I played around with the generated JS code a little bit and found that optimising the ref into a variable also makes a big difference.

1257,1258c1257
<         A = 0,
<         B = J[1];
---
>         A = 0;
1260a1260
>             B = J[1],
1267c1267
<         B = [au, V(aP, 20), aS, aR, aQ];
---
>         J[1] = [au, V(aP, 20), aS, aR, aQ];
1275d1274
<         J[1] = B;

Timings on my machine:

Initial: 45.408s
With non-polymorphic min/max: 34.080s
Optimised ref: 33.742s
Both: 7.317s

And, for reference, bucklescript: 6.885s

And lastly, a functional version:

let rec f i rect =
  let rect = union rect (r 20. 0. 100. (float_of_int i)) in
  if i = lim then rect
  else f (i+1) rect

Functional with non-polymorphic min/max: 7.208s

5 Likes

Ech, of course this is just fine for a non-polymorphic compare, and yields the same results as the “unsafe” JS calls.

let min (a: float) (b: float) : float = if a < b then a else b
let max (a: float) (b: float) : float = if a > b then a else b

You’re right, cutting out the ref (via the recursive reduction) + monomorphic min/max gets very close to the bucklescript level, down to 12.5s on my machine. Good catch!

Eliminating the ref is an obvious improvement, but having to dodge mutability to get solid performance isn’t great. (A great deal of the codebase I’m looking to port is effectively pure, but it does sit on top of a core graph data structure that uses mutation heavily, especially when being initially loaded.)

I thought maybe a mutable “box” record would get the same job done, but it actually had exactly the same performance as the ref:

type 't box = {mutable v: 't}
let _ = let rect = {v = r 25. 25. 200. 200.} in
        for i = 0 to lim do
            rect.v <- union rect.v @@ r 20. 0. 100. (float_of_int i)
        done;
        print_endline @@ string_of_float @@ rect.v.ty

Are mutable record fields actually refs underneath?

The reverse is true: references are single mutable field records: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Pervasives.html#1_References

1 Like

Ah, appreciate the pointer. Somehow I was under the impression that refs and their operators were language primitives instead of part of the stdlib. :thinking:

So with monomorphic min/max and the immutable recursive reduction, things are down to:

=== bucklescript (reason) ===
8.87s

=== js_of_ocaml ===
12.53s

That’s a big improvement for JSOO, but the tradeoff is unfortunate. Polymorphic compare can be worked around; the representation of mutable fields, not so much. I’ve thus pushed an update to the benchmark that includes the former, but sticks with the ref.

Maybe Unsafe could be used to eliminate that overhead for the hottest spots…

Have you tried turning off -g for the js_of_ocaml case? IIRC -g turns off ref optimisation in bytecode and since js_of_ocaml works off the bytecode that’s going to have a negative affect. Since Bucklescript uses a modified compiler it might avoid that issue.

Note that this optimisation only applies to refs (records with a single field). For the rather silly redefinition of ref below, jsoo again has a similar performance profile to bucklescript.

type 'a ref = { mutable contents: 'a; bloat: unit }
let ref x = { contents = x; bloat = () }
let (!) { contents; _ } = contents
let (:=) r x = r.contents <- x

I’m not using a build tool, so I can know/control exactly what’s being done. I’m not passing -g, so I presume debugging isn’t enabled (I don’t see any -no-g or similar, so I assume that’s the default).

I’m compiling using this invocation:

ocamlc bench.ml && js_of_ocaml --opt=3 a.out

Do let me know if I should be using something different.

To clarify, you’re seeing this redefinition of ref et al. yield a significant improvement in JSOO output’s performance? I’ve not been able to replicate so far.

I found the issue: the uses of @@ are preventing the optimisation. If you remove them then things should improve.

4 Likes

No, I was just trying to show that jsoo and bucklescript have a comparable performance when it comes to records with more than one field.

Ugh. Why does @@ lead to de-optimization?

1 Like

Thank you! I’ve pushed a change based on this suggestion, and the timings are now quite tight:

=== bucklescript (reason) ===
9.98s

=== js_of_ocaml ===
12.90s

The change also took ~20% off the runtime of the ocamlopt.opt output.

A few interesting things determined experimentally:

  1. Either composition operator (@@ or |>) can trigger the deoptimization.
  2. The only usage that was impactful was where I had used @@ in declaring the binding of the ref. Using @@ or |> anywhere else had no negative consequences.

All of this is very surprising, at least to me?:

  1. In the languages I’ve used that have operators like this (e.g. $ and co. in haskell, -> and co. in Clojure et al., etc), they are fundamentally syntactic affordances. Clearly something more is going on here.
  2. In cases where all of the arguments to a function are provided (like here, e.g. let foo = ref @@ expressions...), shouldn’t composition operators simply vanish?
  3. To that point, a post coinciding with the 4.0 release talked about implementing exactly that optimization, but apparently that is no longer the case?
1 Like

The ref optimisation is very simple and syntactic. It would need to either detect @@ as a special case or be done after inlining to catch this case. In flambda it is done after inlining so it should work fine in this case, but of course you can’t use flambda with bytecode/js_of_ocaml.

1 Like

FWIW, I am using +fp+flambda, and the ocamlopt.opt timings were impacted by the composition operators, 5.15s vs. 6.32s. ¯\(ツ)