TechEmpower benchmark: httpaf + lwt + unix on par with Haskell's warp

The recent talks around this benchmark (started by @rbjorklin) and the OCaml results piqued my curiosity. After all, one of OCaml’s strengths is its performance! It should smoke node.js and be in the same ballpark as Haskell, Go, etc., right? Well, only one way to find out!

To start simple, I wanted to establish a baseline using a pre-forking approach with a listening socket shared by all children and go from there. It turns out I didn’t need to go far at all. This simple architecture was enough to leave node.js in the dust and get results similar to Haskell and Go. This says a lot about the current quality of OCaml and the ecosystem we have today! Handshakes all around! :beer::camel:

You can find the results for the JSON benchmark here. Be sure to check the Latency tab. Note that a lot of the top performers optimize aggressively by precomputing responses, using object pools, etc.

From my limited testing, the big difference with the previous OCaml attempts might be that they had to use (the otherwise amazing) haproxy to load-balance between all cores. Pre-forking and sharing a socket removes that need, so that all cores can do useful work. I also had some fun using SIGALRM to render the date only once per second.

As a side note, it was my first time using these UNIX APIs from OCaml and it was an eye-opening experience to be able to leverage all that power outside of C and without giving up any of OCaml’s strengths.

I’m happy with the results, but it should be possible to improve even further by:

  • profiling with perf to know where time is spent, e.g. in JSON encoding, allocations, GC, context switches, etc.
  • using libuv instead of libev, maybe via this PR to Lwt
  • using Multicore domains to use a pre-threaded architecture as opposed to a pre-forked architecture and hopefully reduce context switch costs, see Linux Applications Performance

Contributing is pretty easy, just clone this repo and run ./tfb --test httpaf --type json --concurrency-levels 512.