Lwt multi-processing much more performant than eio multi-core?

dinosaure · March 30, 2025, 2:46pm

I tried an experiment, but this time with httpcats and miou.

From what I can see, lwt still exceeds the request rate per second. I think this is mainly due to the fact that, even if Miou offers a domain pool, there are still synchronization mechanisms between the domains that lwt does not implement.

The use of Lwt_unix.fork (rather than Stdlib.Domain.spawn) also avoids synchronization of the OCaml major heap between domains. Furthermore, the case of httpaf+lwt is more like 32 executables (for 32 cores) acting as a web servers rather than a single executable executing OCaml tasks in parallel.

However, I have noted that httpcats is better than what httpun+eio can offer. Here is a summary table^[1]. This is the result of latency (the average) given by wrk/tfb:

	httpaf+lwt	httpun+eio	httpcats
8 clients, 8 threads	19.05us	327.14us	32.54us
512 clients, 32 threads	8.5ms	1.88ms	1.07ms
16 clients, 16 threads	29.75us	808.30us	39.22us
32 clients, 32 threads	39.21us	1.23ms	64.83us
64 clients, 32 threads	425.44us	1.26ms	124.32us
128 clients, 32 threads	250.84us	1.15ms	263.56us
256 clients, 32 threads	2.51ms	1.25ms	471.59us
512 clients, 32 threads (warmed)	10.83ms	1.89ms	0.98ms

Note that httpcats supports client management more than httpaf+lwt and httpun+eio (latency is lowest when we have 512 clients). This may be due to the fact (compared to httpaf+lwt) that Miou asks the system for events (such as the arrival of a new connection) more often than lwt. In fact, lwt tends to execute OCaml tasks further down the line rather than periodically requesting new events (so it will simply prioritize the management of an HTTP request rather than managing the arrival of a new connection).

This is the result of the number of requests per second (the average) given by wrk/tfb:

	httpaf+lwt	httpun+eio	httpcats
8 clients, 8 threads	51.26k req/s	25.37k req/s	33.29k req/s
512 clients, 32 threads	45.65k req/s	14.56k req/s	16.65k req/s
16 clients, 16 threads	35.22k req/s	13.83k req/s	27.49k req/s
32 clients, 32 threads	25.44k req/s	13.37k req/s	16.6k req/s
64 clients, 32 threads	38.12k req/s	12.08k req/s	17.45k req/s
128 clients, 32 threads	41.31k req/s	13.27k req/s	18.1k req/s
256 clients, 32 threads	43.96k req/s	14.03k req/s	17.96k req/s
512 clients, 32 threads (warmed)	44.78k req/s	14.37k req/s	16.82k req/s

As I said, lwt outperforms the others, but you always have to keep in mind that the implementation consists of 32 programs (for 32 cores which don’t share the same GC) that manage all the requests, whereas in the case of httpun+eio or httpcats, it is indeed 32 domains (sharing the same major heap) and in which there are synchronization mechanisms (mutex and condition) in the OCaml runtime and in what eio or miou offer.

Furthermore, making an application where you would like to share a global resource between all the HTTP request handlers you spawned with Lwt_unix.fork made might be more difficult than with httpun+eio or httpcats.

Finally, one last note is that httpcats uses miou.unix which uses Unix.select — it is a fairly legitimate criticism to use something other than the latter as it has quite a few limitations (in particular on the number of file descriptors that can be managed) but it is also something that can easily be improved — at least, the design of Miou^[2] tends to be able to inject your own logic of system events such as the Solo5’s one for unikernels.

What I want to mention above all is that it seems to me that lwt uses libev in your example and eio uses io_uring. Despite Miou’s penalty (due to Unix.select), the performances that httpcats offer are still interesting ^[3].

Finally, I would also like to mention that if you would like to go further with HTTP, we are currently developing vif: a small web framework based on httpcats. EDIT: vif is very experimental, even if we continue to develop it, don’t expect everything to work without a hitch!

My CPU is an AMD Ryzen 9 7950X. ↩︎
In particular, you might like to take the time to read this short tutorial explaining how to inject your own system to manage system events, and we could very easily imagine miou+io_uring. ↩︎
Comparisons between schedulers can always be difficult. As mentioned in the README.md of httpcats, having a well-defined and reproducible protocol in order to offer reliable metrics is already a job in itself that always goes much further than launching a simple program like wrk. ↩︎

Topic		Replies	Views
TechEmpower benchmark: httpaf + lwt + unix on par with Haskell's warp Community server , http , benchmark	0	1343	November 17, 2020
Eio Digest #1 (September 2023) Community multicore , eio , eio-digest	0	927	September 6, 2023
OCaml 5 performance Ecosystem multicore , performance , profiling , eio	30	3074	September 11, 2024
Update on Eio (effects-based direct-style IO for OCaml 5) Community multicore , lwt , async , effects , concurrency	2	2927	June 5, 2023
About Multicore Ecosystem multicore , compiler , faq	59	13345	April 10, 2018

Lwt multi-processing much more performant than eio multi-core?

Related topics