I have been tinkering with OCaml, bucklescript, and js_of_ocaml. I have a microbenchmark that I’ve been implementing for various languages over the last month or so as I sample them; implementing it and running the results through BS and JSOO yielded surprising (at least for me) results:
I have some understanding of each compiler’s respective design constraints, so I was expecting that ordering, but not a margin that large. All of the source and compiler invocations, etc are available in https://github.com/cemerick/silly-shootout, so if anyone sees any place where I’m making an egregious error, I’d appreciate that feedback.
Good suggestion! Using local implementations of min and max that call out to js directly:
module Unsafe = Js_of_ocaml.Js.Unsafe;;
let jsmin a b =
Unsafe.fun_call (Unsafe.js_expr "Math.min") [|Unsafe.inject a; Unsafe.inject b|]
let jsmax a b =
Unsafe.fun_call (Unsafe.js_expr "Math.max") [|Unsafe.inject a; Unsafe.inject b|]
cuts the JSOO-emitted code’s runtime from 58s to 44s. A good improvement, but obviously there’s more going on.
BTW, regarding BuckleScript: it does provide it’s own min and max impls, but they’re no slower than calling out to Math.min/max. It turns out that its runtime representation of records is responsible for the entirety of the margin between its generated code and the “native” JS implementation I have in that repo. Details here: https://github.com/BuckleScript/bucklescript/issues/2922#issuecomment-454683001
You’re right, cutting out the ref (via the recursive reduction) + monomorphic min/max gets very close to the bucklescript level, down to 12.5s on my machine. Good catch!
Eliminating the ref is an obvious improvement, but having to dodge mutability to get solid performance isn’t great. (A great deal of the codebase I’m looking to port is effectively pure, but it does sit on top of a core graph data structure that uses mutation heavily, especially when being initially loaded.)
I thought maybe a mutable “box” record would get the same job done, but it actually had exactly the same performance as the ref:
type 't box = {mutable v: 't}
let _ = let rect = {v = r 25. 25. 200. 200.} in
for i = 0 to lim do
rect.v <- union rect.v @@ r 20. 0. 100. (float_of_int i)
done;
print_endline @@ string_of_float @@ rect.v.ty
Are mutable record fields actually refs underneath?
That’s a big improvement for JSOO, but the tradeoff is unfortunate. Polymorphic compare can be worked around; the representation of mutable fields, not so much. I’ve thus pushed an update to the benchmark that includes the former, but sticks with the ref.
Maybe Unsafe could be used to eliminate that overhead for the hottest spots…
Have you tried turning off -g for the js_of_ocaml case? IIRC -g turns off ref optimisation in bytecode and since js_of_ocaml works off the bytecode that’s going to have a negative affect. Since Bucklescript uses a modified compiler it might avoid that issue.
Note that this optimisation only applies to refs (records with a single field). For the rather silly redefinition of ref below, jsoo again has a similar performance profile to bucklescript.
type 'a ref = { mutable contents: 'a; bloat: unit }
let ref x = { contents = x; bloat = () }
let (!) { contents; _ } = contents
let (:=) r x = r.contents <- x
I’m not using a build tool, so I can know/control exactly what’s being done. I’m not passing -g, so I presume debugging isn’t enabled (I don’t see any -no-g or similar, so I assume that’s the default).
I’m compiling using this invocation:
ocamlc bench.ml && js_of_ocaml --opt=3 a.out
Do let me know if I should be using something different.
To clarify, you’re seeing this redefinition of ref et al. yield a significant improvement in JSOO output’s performance? I’ve not been able to replicate so far.
The change also took ~20% off the runtime of the ocamlopt.opt output.
A few interesting things determined experimentally:
Either composition operator (@@ or |>) can trigger the deoptimization.
The only usage that was impactful was where I had used @@ in declaring the binding of the ref. Using @@ or |> anywhere else had no negative consequences.
All of this is very surprising, at least to me?:
In the languages I’ve used that have operators like this (e.g. $ and co. in haskell, -> and co. in Clojure et al., etc), they are fundamentally syntactic affordances. Clearly something more is going on here.
In cases where all of the arguments to a function are provided (like here, e.g. let foo = ref @@ expressions...), shouldn’t composition operators simply vanish?
The ref optimisation is very simple and syntactic. It would need to either detect @@ as a special case or be done after inlining to catch this case. In flambda it is done after inlining so it should work fine in this case, but of course you can’t use flambda with bytecode/js_of_ocaml.