Hi everyone! I have been using js_of_ocaml (via the Brr toolkit) to run my pure OCaml Game Boy emulator in the browser. I am now trying to minify the JS bundle, and one thing I noticed is that the JS output compiled with the --profile=release flag (via dune build --profile release) performs considerably worse, in terms of speed, compared to the output compiled without the flag (via dune build). This was surprising as I was expecting the release build to perform the same (if not slightly better) than the normal build.
I have hosted the emulator without the UI (i.e. in “headless” mode) in the following URLs to demonstrate the difference:
Built with dune build --profile=release: CAMLBOY BENCH
When I run both builds in my local environment (i7-8665U CPU @ 1.90GHz, Ubuntu 20.04.2, Google Chrome Version 95.0.4638.69), the former runs at ~260 FPS while the latter runs at ~116 FPS.
Question:
Is this expected behavior and/or am I completely missing something? And is there some way to get a minified output without a performance hit?
This is just a guess, but doing the profiling in Chrome it looks like most of the time is spent in “Run Microtasks” which from what I can find means things like setTimeout(...). I would try a benchmark that doesn’t use setTimeout between the calls, maybe call main_loop directly (this may freeze the browser for the duration of the test) and see if release is still slower.
I’m wondering if release actually is faster, but it’s spamming the browser internals too much and hitting some slowdown there?
Sorry to sidetrack this thread, but you wrote a Gameboy emulator written in OCaml! That’s so cool. Happy to add you to the sparse Application page in OCamlverse.
I would try a benchmark that doesn’t use setTimeout between the calls, maybe call main_loop directly (this may freeze the browser for the duration of the test) and see if release is still slower.
Actually, the current headless mode benchmark runs the main_loop directly without any calls to setTimeout (namely at this part in bench.ml). Thanks for taking a look anyways!
But didn’t see explicit JSOO release mode callouts.
Yeah, I couldn’t find much documentation around this either.
assert that --profile=release activates the --opt=3 flag when calling into js_of_ocaml
Thanks for the suggestion. I ran dune build with and without profile=release together with the --verbose flag suggested by @kit-ky-kate and found that:
--opt 3 was NOT passed to js_of_ocaml in both cases
When built withoutprofile=release the build goes through the separate compilation steps, namely it separately compiles cmo/cma files of the dependent library/main benchmark code with --pretty --source-map-inline flag and links them together. I have documented the simplified version of the --verbose output here.
Based on the first point, it looks like the existence of --opt=3 is not the problem. And the second point is also pretty interesting as it is stated in the official doc that separate compilations are suppost to have “slower runtime”. Quote:
Separate compilation improves the overall compilation times and gives (many) incremental build opportunities.
Theses improvements come at the cost of bigger executable and slower runtime.
Did you make a typo? Separate compilation is for profile=dev only.
Dune 3.0 will allow you to customize jsoo flags per profile so will be able to pass --opt=3 there. I don’t think that --opt=3 should be inserted by dune by default. If it’s the better default for release builds, then jsoo should make that call.
Adding (js_of_ocaml (flags (:standard --no-inline)))
to the dune file in bin/web speeds up the release benchmark by quite a lot. With this flag release is also faster than non-release by ~20fps on my machine.
With inlining it looks like it inlines the entire fetch+decode into the benchmark loop and is kind of a mess. My guess is the javascript JIT is having a hard time optimizing that code. Without inlining everything conforms closer to the original code.
You can add --pretty to the flags above to see the release mode code better.
It’s a bit off-topic, but I don’t believe compilers should have a notion of release vs dev builds; that’s a build system concept. So if some option is better for release builds, the build system should pass it to the compiler, the compiler should not have to infer from unrelated flags what the default optimisation level is. (The same applies to ocamlopt, by the way.)
Adding (js_of_ocaml (flags (:standard --no-inline)))
to the dune file in bin/web speeds up the release benchmark by quite a lot.
Super good find! Adding the --no-inline flag to the release build resulted in ~400FPS in my local environment, which is ~3.3 times more than release without --no-inline and ~1.5 times more than non-release (i.e. dev) build. I have summarized the result in the below chart:
The “release with --no-inline” also reduced the JS bundle size from 1MB to 52KB when compared to the dev build, which is nice.
So in summary, in this case, the cause of the bad performing release build was inlining (this may be related to some JIT optimization as @OCamlUser guessed, but I am not sure). My non-release dev build was performing better probably because it happened to be unable to do the inlining due to restrictions of separate compilation.
I kind of assumed that some optimization was missing from the release build and didn’t think of the possibility that the optimization itself was the cause.
@OCamlUser Thanks a lot! Will mark your comment as the solution.
You are referring to the fact that --opt=3 not being passed in the release build, right? Then it looks like it is expected behavior according to @rgrinberg (I personally don’t have strong opinions on this).
i was talking about your second point (originally “when built with profile=release”), but it looks like it was a misunderstanding.
Ah, that makes sense, my bad.
Though still, the fact that (without --no-inline ) release mode is still almost 3x slower than dev mode, is weird.
Yeah, my emulator (or emulators in general) might have unusual characteristics that cause this, but the difference does seem extreme. I am thinking of taking another look and creating an issue in Issues · ocsigen/js_of_ocaml · GitHub in case there are some improvements that can be made here.