Cool. Thanks. This is not too bad for the threads version. I had assumed that threads at 10K would be significantly worse but contrary to my assumptions it seems the threads based or one thread per connection model is quite competitive.
[Full disclosure: I prefer to use threads/mutex/condvar instead of CSP/CML-like concurrency, mostly because my systems projects often involve a ton of C/C++ code, and itâs just a -necessity-. That said, Iâve built code using monads â specifically for a web-stress-tester that needed to execute âclick-trail scriptsâ of some complexity.]
If youâre -actually- wanting to build systems that will service 10k connections, then I think you will have to go with that I/O monad. Sure, you can do it with threads, but youâre going to want epoll() and probably other things. You might want to have more than one native âworkerâ thread in order to soak up CPU, but youâre not going to want 10k threads. When you use native threads, you give up control over the implementation of âwait for an external eventâ, and that can be somewhat performance-critical. Also, going with native threads means that you have to worry about all the ways in which threads can hang, GC, etc. For instance, John Reppy went to some length in CML to ensure that âthreads that provably could not make progressâ would be GCed. As it turns out, at least as of a couple of years ago, Golangâs goroutines had not implemented that sort of functionality.
Itâs actually worse than that. If (again) youâre looking to hit 10k connections, I suspect youâll find that NO off-the-shelf I/O monad framework is enough â youâll end up having to hack the internals of whatever you choose â because almost nobody has this sort of problem.
Which brings me to my punchline, which is: I think that, itâs not so useful to think about â10k connsâ as a way of evaluating concurrency solutions. If you build a system that needs hundreds of threads, youâre probably already succeeding and can afford to revisit its implementation. Iâd suggest that it makes much more sense to pick a concurrency framework based on other considerations, like easy access to native libraries, programmer experience with I/O monads (or serious willingness+time to learn), whether there are a lot of libraries that need to be rewritten in monadic style, error-handling, etc.
As I said, I wrote a rather complete I/O monad implementation for this web-stress-tester, and while it was âjust fine/fine/fineâ for writing code, I never used that framework again â typically I donât need to support a thousand threads, and at that point, fugeddaboudit, Iâm goinâ with thread/mutex/condvar.
Thanks @Chet_Murthy for your wonderful answer.
Indeed, I donât think my current lwt solution have to support 10k concurrent connections. I am writing a lib to support FCGI protocol for my web applications. It is currently using Lwt after having recently learned lwt myself. From my experience learning lwt, I realized that even for an experienced ocaml programmer without any background in monads and such, learning and using lwt monad is a serious investment of time and effort. This made me realize that users of my library would have had to put in perhaps the same amount of learning(for lwt) just to use my library. Perhaps this is not so much of a learning curve, but I couldnât help thinking if just using plain threads is sufficient performance wise while removing the lwt learning curve for the users of my library.
Is this really a problem in practice? I havenât used native ocaml threads at all so curious as to what your experience has been with it.
B
[OK, old-skool web-TP thoughts âŚ]
TL;DR why not just do FCGI with a single process (serially-reused) process per request, and see how far you get?
Is this for an FCGI back-end? That is, thereâll be a webserver in front, and will be calling to Ocaml code running behind the FCGI protocol? If so, do you really need LWT? Or even threads? Hereâs why Iâm asking:
-
typically a webserver will absorb almost all the concurrency that exists coming from the network. It has to buffer requests and responses anyway (in order to do parsing, routing, etc) and thatâs on top of socket-buffers. Except for the largest req/resp, thatâs typically sufficient. And itâs rare that such large (e.g. media) requests are handled via FCGI.
-
the value of LWT goes down if thereâs no I/O concurrency to be had.
-
there is value in the process-isolation that comes with one-process-per-request (of course, that process gets reused serially)
-
If the intent is to use shared variables in the process as a sort of âdatabaseâ ⌠well, that can work, but historically itâs been found that tiâs better to put such shared mutable data in an external store (if nothing else, a local memcached) â this is an aid to debugging, as well as making for more robust systems.
If we look back at the history of transaction -processing, we can see this pattern repeat itself:
- originally CICS was akin to this LWT approach, but IMS/DL1 more like FCGI. And (it turned out) CICS got used for more-lightweight trans, where IMS/DL1 got used for more heavyweight trans
- The web started off with CGI (ugh) and FCGI (as well as variants like mod_perl) and moved toward Java[1] with shared processes and threads. This was ⌠problematic for reasons #3/#4 above, and lots of web-app frameworks continue to use âone request at a time per processâ models for application code.
- the one place where lightweight concurrency has really stuck, is when dealing with reverse-AJAX and other models (like websockets) that use massive concurrency to allow the server to push content to the client. But this is really different from client->server RPC, and it would be (IMO) a mistake to try to fit them into the same codebase and runtime.
[1] the push for âmultiple concurrent requests in a single processâ in Java was mostly due to the enormous weight of a Java process, both in memory and startup-time.
This is a problem for all applications in complex transaction-processing systems. Unless your application code is vanishingly simple, eventually somebodyâs going to write something that causes a hang.
[the rest is written partially from memory, partially from a quick scan of the Apache mod_fcgid documentation; I could be wrong about this âŚ]
Also though, as I think about it, thereâs another problem you might want to consider: FastCGi was designed with the idea that behind front-end webserver, is a pool of processes. It was not originally designed with the intention that there be a pool of -threads- in a single process behind. So for instance in Apache mod_fcgid, there are a bunch of different timeouts, and they apply to each process/connection independently. If Apache times out reading a response back from an FCGI connection, it will terminate that connection, but it wonât know to (for instance) terminate all connections to the corresponding process.
What Iâm saying is: when/if there are âfaultsâ (errors of various kinds), the FastCGI protocol is designed so that recovery can occur on a per-connection basis. If you route all connections to a single process, youâre pretty much vitiating that recovery logic. And there isnât any other recovery logic available for the FastCGI protocol.
I might be wrong about this though â your goals might be different, and the FastCGI protocol guarantees might be different today.
Goroutines are implemented more or less as coroutines, but the scheduler multiplexes them onto a thread pool. Iâm not sure of the exact details, but goroutines are a bit heavier than coroutines in other languages. Last I heard, each goroutine has an allocation cost of 8KB. Because they are multiplexed on OS threads, they donât have explicit break points the same way asynchronous coroutines normally do. All of Goâs I/O functions and a few others implicitly yield to the scheduler. If one blocks on something else (like the CPU. Go never blocks on I/O), the other threads in the pool will continue to have work scheduled on them. Iâm not sure how Go handles it if there is blocking on all threads in the pool.
From my very brief experience with OCaml, if youâre doing any kind of network I/O with a third-party library, youâre either using Async
or Lwt
(usually the latter), and both are monadic. I donât have a ton of experience with monads myself, but the monadic paradigm presented by these libraries doesnât differ substantially from async
and await
in languages that have them.
Iâm not suggesting that there isnât a learning curve involved, but itâs something most people working with network I/O are going to have to deal with at some point anyway.
Just discovered this gem which discusses the exact same issue the current thread is trying to address.
TLDR - pthreads/systhreads is quite performant to poll/epoll techniques. Additionally, it seems this multicore PR - Reimplementing Systhreads with pthreads (Domain execution contexts) by Engil ¡ Pull Request #381 ¡ ocaml-multicore/ocaml-multicore ¡ GitHub - enables true parallelism in addition to concurrency.
To note as well, the changes includes the ability to run
decs
/systhreads
accross many domains at once.
The article you linked is an interesting read. This talk on youtube also explains the difference in an easily digestible way.
Looking at awesome-ocaml there doesnât seem to be a web framework that relies on threading. Is there one out there somewhere?
Itâs a very far cry from a framework, but my tiny
httpd relies on threads and works
pretty well for http 1.1.