Lwt multi-processing much more performant than eio multi-core?

I tried an experiment, but this time with httpcats and miou.

From what I can see, lwt still exceeds the request rate per second. I think this is mainly due to the fact that, even if Miou offers a domain pool, there are still synchronization mechanisms between the domains that lwt does not implement.

The use of Lwt_unix.fork (rather than Stdlib.Domain.spawn) also avoids synchronization of the OCaml major heap between domains. Furthermore, the case of httpaf+lwt is more like 32 executables (for 32 cores) acting as a web servers rather than a single executable executing OCaml tasks in parallel.

However, I have noted that httpcats is better than what httpun+eio can offer. Here is a summary table[1]. This is the result of latency (the average) given by wrk/tfb:

httpaf+lwt httpun+eio httpcats
8 clients, 8 threads 19.05us 327.14us 32.54us
512 clients, 32 threads 8.5ms 1.88ms 1.07ms
16 clients, 16 threads 29.75us 808.30us 39.22us
32 clients, 32 threads 39.21us 1.23ms 64.83us
64 clients, 32 threads 425.44us 1.26ms 124.32us
128 clients, 32 threads 250.84us 1.15ms 263.56us
256 clients, 32 threads 2.51ms 1.25ms 471.59us
512 clients, 32 threads (warmed) 10.83ms 1.89ms 0.98ms

Note that httpcats supports client management more than httpaf+lwt and httpun+eio (latency is lowest when we have 512 clients). This may be due to the fact (compared to httpaf+lwt) that Miou asks the system for events (such as the arrival of a new connection) more often than lwt. In fact, lwt tends to execute OCaml tasks further down the line rather than periodically requesting new events (so it will simply prioritize the management of an HTTP request rather than managing the arrival of a new connection).

This is the result of the number of requests per second (the average) given by wrk/tfb:

httpaf+lwt httpun+eio httpcats
8 clients, 8 threads 51.26k req/s 25.37k req/s 33.29k req/s
512 clients, 32 threads 45.65k req/s 14.56k req/s 16.65k req/s
16 clients, 16 threads 35.22k req/s 13.83k req/s 27.49k req/s
32 clients, 32 threads 25.44k req/s 13.37k req/s 16.6k req/s
64 clients, 32 threads 38.12k req/s 12.08k req/s 17.45k req/s
128 clients, 32 threads 41.31k req/s 13.27k req/s 18.1k req/s
256 clients, 32 threads 43.96k req/s 14.03k req/s 17.96k req/s
512 clients, 32 threads (warmed) 44.78k req/s 14.37k req/s 16.82k req/s

As I said, lwt outperforms the others, but you always have to keep in mind that the implementation consists of 32 programs (for 32 cores which don’t share the same GC) that manage all the requests, whereas in the case of httpun+eio or httpcats, it is indeed 32 domains (sharing the same major heap) and in which there are synchronization mechanisms (mutex and condition) in the OCaml runtime and in what eio or miou offer.

Furthermore, making an application where you would like to share a global resource between all the HTTP request handlers you spawned with Lwt_unix.fork made might be more difficult than with httpun+eio or httpcats.

Finally, one last note is that httpcats uses miou.unix which uses Unix.select — it is a fairly legitimate criticism to use something other than the latter as it has quite a few limitations (in particular on the number of file descriptors that can be managed) but it is also something that can easily be improved — at least, the design of Miou[2] tends to be able to inject your own logic of system events such as the Solo5’s one for unikernels.

What I want to mention above all is that it seems to me that lwt uses libev in your example and eio uses io_uring. Despite Miou’s penalty (due to Unix.select), the performances that httpcats offer are still interesting :slight_smile: [3].

Finally, I would also like to mention that if you would like to go further with HTTP, we are currently developing vif: a small web framework based on httpcats. EDIT: vif is very experimental, even if we continue to develop it, don’t expect everything to work without a hitch!


  1. My CPU is an AMD Ryzen 9 7950X. ↩︎

  2. In particular, you might like to take the time to read this short tutorial explaining how to inject your own system to manage system events, and we could very easily imagine miou+io_uring. ↩︎

  3. Comparisons between schedulers can always be difficult. As mentioned in the README.md of httpcats, having a well-defined and reproducible protocol in order to offer reliable metrics is already a job in itself that always goes much further than launching a simple program like wrk. ↩︎

8 Likes