Js_of_ocaml output performs considerably worse when built with --profile=release flag

Hi everyone! I have been using js_of_ocaml (via the Brr toolkit) to run my pure OCaml Game Boy emulator in the browser. I am now trying to minify the JS bundle, and one thing I noticed is that the JS output compiled with the --profile=release flag (via dune build --profile release) performs considerably worse, in terms of speed, compared to the output compiled without the flag (via dune build). This was surprising as I was expecting the release build to perform the same (if not slightly better) than the normal build.

I have hosted the emulator without the UI (i.e. in “headless” mode) in the following URLs to demonstrate the difference:

When I run both builds in my local environment (i7-8665U CPU @ 1.90GHz, Ubuntu 20.04.2, Google Chrome Version 95.0.4638.69), the former runs at ~260 FPS while the latter runs at ~116 FPS.

You can also compare them with the UI here:

  • Built with dune build: CAMLBOY
  • Built with dune build --profile=release: CAMLBOY

Question:
Is this expected behavior and/or am I completely missing something? And is there some way to get a minified output without a performance hit?

Thanks!

2 Likes

This is just a guess, but doing the profiling in Chrome it looks like most of the time is spent in “Run Microtasks” which from what I can find means things like setTimeout(...). I would try a benchmark that doesn’t use setTimeout between the calls, maybe call main_loop directly (this may freeze the browser for the duration of the test) and see if release is still slower.

I’m wondering if release actually is faster, but it’s spamming the browser internals too much and hitting some slowdown there?

1 Like

Hmm. I went to consult the dune docs here,

But didn’t see explicit JSOO release mode callouts.

Ideas:

  • assert that --profile=release activates the --opt=3 flag when calling into js_of_ocaml
    • probably have to dig thru the dune source
  • add options to compile with sourcemaps via (js_of_ocaml (<js_of_ocaml-options>)), and create heatmaps in your profiling
1 Like

dune build --profile=release --verbose should show what is given to js_of_ocaml. No need to dig through the source code for that.

2 Likes

Sorry to sidetrack this thread, but you wrote a Gameboy emulator written in OCaml! That’s so cool. Happy to add you to the sparse Application page in OCamlverse.

2 Likes

Gameboy emulators in OCaml are somewhat common, @Engil has written one too: GitHub - Engil/Goodboy: A pure OCaml Gameboy emulator

I was considering making one too or contributing. I’d love a Gameboy emulator specifically designed to run LSDJ. But time is at a premium.

4 Likes

And its unofficial followup GitHub - unsound-io/BetterBoy: The sequel to GoodBoy.
But nowhere near as nice and functional as what @linoscope has done, great work!

1 Like

I would try a benchmark that doesn’t use setTimeout between the calls, maybe call main_loop directly (this may freeze the browser for the duration of the test) and see if release is still slower.

Actually, the current headless mode benchmark runs the main_loop directly without any calls to setTimeout (namely at this part in bench.ml). Thanks for taking a look anyways!

1 Like

But didn’t see explicit JSOO release mode callouts.

Yeah, I couldn’t find much documentation around this either.

assert that --profile=release activates the --opt=3 flag when calling into js_of_ocaml

Thanks for the suggestion. I ran dune build with and without profile=release together with the --verbose flag suggested by @kit-ky-kate and found that:

  1. --opt 3 was NOT passed to js_of_ocaml in both cases
  2. When built without profile=release the build goes through the separate compilation steps, namely it separately compiles cmo/cma files of the dependent library/main benchmark code with --pretty --source-map-inline flag and links them together. I have documented the simplified version of the --verbose output here.

Based on the first point, it looks like the existence of --opt=3 is not the problem. And the second point is also pretty interesting as it is stated in the official doc that separate compilations are suppost to have “slower runtime”. Quote:

Separate compilation improves the overall compilation times and gives (many) incremental build opportunities.
Theses improvements come at the cost of bigger executable and slower runtime.

1 Like

That looks like a bug in dune to me. Could you open an issue in Issues · ocaml/dune · GitHub ?

2 Likes

Did you make a typo? Separate compilation is for profile=dev only.

Dune 3.0 will allow you to customize jsoo flags per profile so will be able to pass --opt=3 there. I don’t think that --opt=3 should be inserted by dune by default. If it’s the better default for release builds, then jsoo should make that call.

1 Like

Ah :frowning:

Adding
(js_of_ocaml (flags (:standard --no-inline)))
to the dune file in bin/web speeds up the release benchmark by quite a lot. With this flag release is also faster than non-release by ~20fps on my machine.

With inlining it looks like it inlines the entire fetch+decode into the benchmark loop and is kind of a mess. My guess is the javascript JIT is having a hard time optimizing that code. Without inlining everything conforms closer to the original code.

You can add --pretty to the flags above to see the release mode code better.

4 Likes

It’s a bit off-topic, but I don’t believe compilers should have a notion of release vs dev builds; that’s a build system concept. So if some option is better for release builds, the build system should pass it to the compiler, the compiler should not have to infer from unrelated flags what the default optimisation level is. (The same applies to ocamlopt, by the way.)

3 Likes

Did you make a typo? Separate compilation is for profile=dev only.

Ah, good catch. Yes

When built with profile=release the build goes through the separate compilation steps

was meant to be

When built without profile=release the build goes through the separate compilation steps

Fixed in the original comment.

Adding
(js_of_ocaml (flags (:standard --no-inline)))
to the dune file in bin/web speeds up the release benchmark by quite a lot.

Super good find! Adding the --no-inline flag to the release build resulted in ~400FPS in my local environment, which is ~3.3 times more than release without --no-inline and ~1.5 times more than non-release (i.e. dev) build. I have summarized the result in the below chart:
jsoo_compare

The "release with --no-inline" also reduced the JS bundle size from 1MB to 52KB when compared to the dev build, which is nice.

So in summary, in this case, the cause of the bad performing release build was inlining (this may be related to some JIT optimization as @OCamlUser guessed, but I am not sure). My non-release dev build was performing better probably because it happened to be unable to do the inlining due to restrictions of separate compilation.

I kind of assumed that some optimization was missing from the release build and didn’t think of the possibility that the optimization itself was the cause.

@OCamlUser Thanks a lot! Will mark your comment as the solution.

5 Likes

You are referring to the fact that --opt=3 not being passed in the release build, right? Then it looks like it is expected behavior according to @rgrinberg (I personally don’t have strong opinions on this).

no i was talking about your second point (originally “when built with profile=release”), but it looks like it was a misunderstanding.

Though still, the fact that (without --no-inline) release mode is still almost 3x slower than dev mode, is weird.

1 Like

i was talking about your second point (originally “when built with profile=release”), but it looks like it was a misunderstanding.

Ah, that makes sense, my bad.

Though still, the fact that (without --no-inline ) release mode is still almost 3x slower than dev mode, is weird.

Yeah, my emulator (or emulators in general) might have unusual characteristics that cause this, but the difference does seem extreme. I am thinking of taking another look and creating an issue in Issues · ocsigen/js_of_ocaml · GitHub in case there are some improvements that can be made here.

2 Likes