threading/IO monad vs threads: the case of web-app servers

OK, so this is standard, and that’s (most definitely) good. If your’e not using any in-process session-state, then there’s no need for mutexes except for things like connection-pools, the actual network runtime, etc. It’s possible that RPCs to backend systems might need to run in parallel, and for that you might want to write a little thread-pool-based harness to manage backend RPCs, but that would be it.

Which leads me to ask: why bother with monadic concurrency? I mean, what’s it buying you? [other than, as Vincent was very honest to say, publications] You have no actual interference between concurrent trans, since they share no state. Why bother with all the extra work and mental model?

ETA: you can assume I know how modern (and pre-modern) web-app systems are built: I spent my career debugging web-app dumpster fires, including at Twitter in the summer of 2008. So none of this is unfamiliar; to the contrary, I’m trying to find out what you’re doing that’s different from standard stuff.

1 Like

Well, we need (or want) concurrency to manage non-blocking I/O, and monadic concurrency happens to be a convenient way to do that.

If you’re asking in terms of scaling: as you mentioned in a previous post if you need to scale up then monadic concurrency (in Ocaml, at least) is a more efficient usage of resources. Context switching between OS threads is expensive and requires going from user space to kernel space to user space for each context switch, in most monadic concurrency implementations you just get your list of events once and then rip through them in user space.

Additionally, depending on your architecture, your request handler may need to do many things at once, and if you use threads that means you can need to handle concurrency correct in more places than just the low-level infrastructure code. For example, I am working on a tool that adds some useful features ontop of an existing set of services, so for UX reasons (latency) I shot gun blast my requests to those users and collect them rather than do them one at a time. Another pattern I end up doing is a shared resource, for example metrics, on that process which get accumulated over time and requested infrequently. You can, of course, do that with OS threads, as I don’t really think monadic programming is that much different (although I AM looking forward to what algebraic effects gets us) than direct programming in these scenarios, I think the lack of headaches because of not needing common thread primitives makes monadics concurrency are more compelling option.

But, perhaps Caramel being worked on by @ostera will make Ocaml-on-the-BEAM more compelling than monadic concurrency.

1 Like

FWIW, Async used a similar model to Lwt (invented independently, amusingly enough. I’m not sure which one was started first, but we didn’t realize Lwt existed until later), and we use it at the heart of all sorts of applications we built, web and non-web. Especially with some syntactic support (like ppx_let or the new built-in let-syntax) we find it to be a pretty easy model to learn and use.

The upside seems clear enough to me: it gives you a simple and general purpose of way of managing concurrency in a way that minimizes race conditions. We find that even when writing in a mostly functional style (which is not always what we do!) our services generally involve some amount of state management, which means that race conditions are a real thing to worry about.

Chet, I find it a bit hard to understand your perspective. You seem to view the complexity of the solution (monadic concurrency) as much higher than I do, and the value of what it provides as much lower, and I don’t really understand either side of the argument. I guess I can say that we apply all of this at decently high scale (20m lines of code, 600-ish developers, hundreds of applications), and at least in our ecosystem, the value proposition of Async seems clear.

y

10 Likes

Yaron,

There are two things I find troubling about monads. I’ll specifically address monadic concurrency, and not the result monad, simply for concreteness:

(1) monadic concurrency performs better than “direct style+threads” when the “presented concurrency” of a workload is much, much higher than the “active concurrency” of that workload. So for instance, in a message-dispatcher, or an app-server that is almost-exclusively used for long-poll-style requests. In both cases, there will be a ton of connections, each of which will have presented a “request” and is waiting for a “response”.

But this isn’t the common case even today: the common case today is requests small enough to fit into buffers, with responses generated that might or might not fit into buffers, but usually do. The “presented concurrency” is high because requests span multiple packets and take time to flow in completely. For this case, it suffices to put a proxy in front of the app-server (which ought to be done for other reasons, like security) that buffers requests and does partial buffering of responses. This allows for app-server processes to only deal with fully-presented requests (except when that request is a POST/PUT with a large payload, e.g. media) and hence the presented concurrency is the same as the active concurrency.
When that’s the case, monadic concurrency has significant overheads – after all, there’s a reason that SML/NJ, back in the day, got beat by caml-light: SML/NJ consed like shit out of a goose, and caml-light did not. Consing rates matter for performance. Staying on the stack matters. That little “benchmark” I posted demonstrates this, even if it weren’t obvious.

(2) If there were a “linter” that guaranteed that all code using a monad abided by the rules of that monad, sure, I might think it’s not such a problem. But there’s no such tool, and frankly, I suspect that we’d find that a lot of code is out there that breaks those rules.
People talk a lot about how using concurrency monads, you don’t need to use mutexes to get critical sections on in-process shared state. But for typical transaction-processing apps, the only in-process shared-state is for the infrastructure itself. It doesn’t count to say “I can implement my monadic-connection conn-pool without any mutexes” when in fact it’s not that hard to write one using mutexes (and it only needs to be done once). [Of course, a monadic-connection thread-pool with a hard cap on connections needs the equivalent of condition-variables, which … you keep going with this, and you start to long for the simplicity of threads and monitors.]

Maybe a more pithy but slightl pejorative way of putting it is: “sure, you think you’re really getting something out of these monads, but you’re not actually doing anything complicated, so it’s not surprising you can make it work. Try something more complicated, and it’ll be … really painful.”

An example I’ve mentioned before: I worked with a “green threads” implementation in Python (so: one pthread onto which a bunch of Python stacks were managed as threads). The people I was working with (at Second Life) were accessing Mysql, and using native C code libraries. This obviously did not work, and I wrote for them a little “native thread pool + trampoline back-and-forth to green threads” so that a green thread that wanted to run a mysql operation would suspend itself in the green thread scheduler, switch to the native thread where the operation got run, and then switch back, re-enabling itself in the green thread scheduler. Needless to say, this involved mutex/condvar etc.

Now, you can argue that nobody needs to ever use native C libraries: we can always rewrite those libraries in OCaml+Lwt/Async, and we’ll be able to run them under Lwt. But that’s … well, it’s got its limitations, and for instance, if I want to use RocksDB, that’s imply not going to work. More generally, if I want to do anything that requires I present concurrency to the kernel (e.g. a storage system talking to disks/SSDs that needs to fill the request queue) you’re going to need to deal with actual pthreads.

I could probably go on, but I’ll stop here.

ETA: maybe another way of thinking about #1 is this: there is a difference between a TP monitor (aka “transactional app server”) and a communications transponder. We shouldn’t mix the two up. Sure, the latter should be implemented using events, not threads. But the former? Nopes.

Having implemented this, I did not long for threads and monitors. It’s really just a list of promises that you determine when a pool is available, and a promise that gets determined when a connection is returned. It’s pretty straight forward and doesn’t require any of the common pitfalls of using threads and condition variables and mutexes incorrectly.

I’m curious what this looks like: would you have a pointer to code to share?

@Chet_Murthy you mention in different places that you favour direct-style programming with mutexes because it’s “easier” or “just like normal code”, and, conversly, that monads are “a pain” and “difficult to grasp”.

I’ll make two wild guesses: (a) you have used direct-style, mutexful code for a while, and (b) you have originally learned programming in direct-style with mutexes. (How far off am I?) And I think these influence your personal preference for direct-style over monads. There’s nothing wrong with personal preference, but I posit that your familiarity with direct-style mutexful programming blinds you to its complexity. And then it makes IO monads seem comparatively more complex.

I’ve had similar discussions with people insisting that functional programming is difficult to use, hard to understand, etc. (Variations on these discussions include “type systems make programming more complicated,” and “your text editor is impossible to use.”) As with any abstractions (mutexes included!) the initial learning is costly and seems either pointless (“look, I have this perfectly working solution right there”) or of little worth (“look, I have a solution right there that works well enough”).

I’ve learned about mutexes (and semaphores and so on) and monads in the same year. Both were confusing at first. Mutexful programming is not easy.

1 Like

All completely correct. Though, I learned CPS (and the Felleisen/Danvy style of CPS translation (that eliminates all administrative redexes) long before I learned threaded programming. Ditto functional programming (I wrote Coq 5.10). So I’m very, very aware of the value of and methods of functional programming.

But you should add:

Nobody writing transactional applications should ever write code that manipulates in-memory shared-state. It’s almost always a “crash landing”, and there are many good reasons for this. And this is well-established: @yawaramin mentioned it a few comments up-thread, for instance. So nobody writing actual applications will be manipulating mutex/condvar, critical sections, etc. The only people doing that will be people writing the infrastructure.

So the choice here isn’t between “monadic concurrency” and “mutex/condvar”. It’s between “monadic concurrency” and “direct style”.

And, sure, the infrastructure developers have to know how to write their infrastructure with the relevant stuff, be it monadic concurrency or mutex/condvar/thread.

P.S. I wrote a monadic I/O system for ocaml in 2002, for use in a high-concurrency scriptable web-stress-tester: so I -do- understand the value of monadic I/O, and specifically for achieving extremely high concurrency for networked systems, specifically in order to overwhelm the target system.

I’m not sure how useful it is as there is a lot of unrelated stuff in there but you can see it here (I can attest that it is working for me but not that there are zero bugs):

The Abbs_channel.{send,recv} are little wrappers around using a promise to signal whoever is waiting on a recv. The channel is used to pass in a Promise that the connection to use will be put into, and the channel is used to return the connection once done. Based on how my concurrency monad works, cancelling the consumer of the connection due to a timeout or because some dependent operation has failed happens, essentially, for free. with the pool only needing to do the check on line 30.

I see three points you’re making about concurrency monads:

  • They’re slow when there’s nothing to do concurrently
  • They’re hard to reason about
  • They’re not that useful in “ordinary” apps.

I pretty much disagree with all three claims.

As to performance: yes, allocating tons of intermediate Deferred.t’s is slow, and slower than a single-threaded solution. But, no one is forcing you to do that. We build tons of applications where most transactions are responded to synchronously and all at once, consciously trying to avoid excess blocking, and with a minimum of allocation This aids both simplicity and performance of the resulting code.

As to your second point, about monads being hard to reason about, I just don’t know where you’re coming from. Yes, you can run into some performance issues if people block inside of an Async program. We mostly deal with that by manipulating the namespace to hide blocking operations, which in practice works pretty well.

But…working with threads is 1000 times worse! Mutexs, monitors, conditional variables, shared memory concurrency, etc, is just incredibly hard to reason about. And when you get it wrong, you don’t have some unexpected pauses in the execution of your program, you have deeply messed-up, hard to predict behavior. For me, the primary win of monadic concurrency is that it’s far easier to reason about than the alternative of shared-memory threads.

Finally, your point about monadic concurrency being unnecessary most of the time, again, this doesn’t line up with my experience. It’s true that for many services, client requests are handled simply and with a minimum of concurrency. Indeed, for lots of applications we run, the vast majority of client requests are handled entirely synchronously. And yet, it’s often convenient to have tasks that are running concurrently with user requests that are doing things like updating the server’s view of the world, or interacting with monitoring or alerting tools. Async makes building such systems easy and simple.

None of this is to say that monadic concurrency is the best of all possible worlds. I’m looking forward to a version of OCaml where we can use multicore OCaml and typed algebraic effects to write direct-style concurrent programs where the type system lets you know where interleavings can happen so you can avoid racy code. But for now, I think monadic concurrency is the best option we have.

A couple of more responses to specific points you made.

I beg to differ. I’ve worked on quite simple and quite complex servers, and we’ve found Async to be a good match for both. Again, Jane Street’s systems run at pretty high scale, and we’ve just not seen the pain points you’re referring to here.

No one is saying that you should never interact with system threads. We wrap up lots of third party C libraries, and yes, that requires some interaction with threads. But we don’t think that threads are a good API to expose widely to users, because they’re extremely hard to use correctly.

I don’t understand the terminology above, but it makes me wonder if you’re assuming a somewhat more rigid scheme about how applications are built than would line up with my reality.

Your view of web applications seems to be things that would be implemented plausibly with a cgi-bin script. But web applications for us are a more complex group of applications, most of which have client sides that are essentially OCaml programs (using js_of_ocaml) that are interacting in quite complex ways, including streaming incrementally updating data from the servers they’re connecting to.

I don’t know whether the server piece there is a transactional app server or a communications transponder, but I do know that we’ve found Async to be an effective tool for implementing such systems (and many others.)

10 Likes

Scaling is done by using several processes, on one or different machines. A load balancer dispatches a requests from a given client always to the same server, in order to be able to keep a session state in memory. And we have a communication system between servers (for example to dispatch notifications) through SNS/SQS on AWS.

By this, do you mean you essentially have a function that is doing as much synchronously, as quickly, as possible and handing a string off to be asynchronously written back to some buffer, all happening withing the context of the async scheduler?

If you have mastered programming with OS threads, then I find it hard to accept that writing code to Lwt’s concurrency monad should come as a shock, even if you only have a sketchy grasp of monads. Perhaps more could be done in the tutorials to cater for people used to programming with OS threads, as there are some obvious parallels which can be offered to help them. The equivalent of a native OS thread of execution is no more than a series of dependent promises held together by the consecutive application of Lwt.bind (or its syntactical equivalent let*, let%lwt or the infix >>=) to the continuations of the promises. The equivalent of a new detached OS thread is a series of one or more such continuations whose final promise is ignored, either by the explicit use of ignore or by the application of Lwt.async. The equivalent of a new joinable thread is a series of such continuations upon which the event loop “waits” via Lwt.join.

When I first came across ocaml’s Lwt I remember writing a toy program (just a socket-based echo bot) using a scheme library (one which used an event loop, with coroutines via delimited continuations as an equivalent to promises), Lwt and pthreads in order to help gain an understanding of how Lwt worked. The structure of the three implementations turned out to be practically identical. The only special thing to remember with event-loop based concurrency is of course that you mustn’t block except through binding on the promise or coroutine.

1 Like

I’m not sure the problem with concurency monads is that they are hard to grasp, especially now that JavaScript hard-coded one in its language definition.

But on the top of my head here are a few things I find problematic about them:

  1. There is more than one in the eco-system. The two most popular and the paraphernalia of supporting libraries that come with them have non-trivial implementations.
  2. They compose poorly, especially if you’d like to use another monad in your program. That’s the problem with monads of course.
  3. They pollute libraries because they do not allow to separate concerns. When I’m writing a codec I don’t really care whether the client moves the data synchronously or asynchronously, I just want to call a function to get or push the next bytes. If you are happy functorizing your code on IO monads, so be it, but dependency wise, functors don’t scale. As far as I’m concerned that’s not a good solution, nor is CPSing your code like I have done too many times like a good monkey.
  4. They kill a lot of the hard work OCaml devs are putting in giving us good backtraces.
  5. They induce yet another scheduler on top of the runtime system, the OS-level thread scheduler and possibly, if you are multiprocessing the OS process scheduler.
  6. Since POSIX is what it is you still need OS-level threads to async these system calls that aren’t, which again induces its own complexity in your async sheduler.

We have obvious proofs that concurrency monads work in OCaml, but when you look the whole picture, as far as I am concerned, it doesn’t look that great.

I’d also be curious on how high-level languages could be better at exploiting all the work that went into making OS-level threading not so heavyweight nowadays (at least on Linux it seems) rather than piling up the schedulers and layers of runtime complexity.

10 Likes

You are probably right (it sounds as if you have a lot more experience on this than me), but on your points 3 and 6, if you want to interface with a synchronous/blocking ocaml library and you are writing code to a Lwt event loop then Lwt_preemptive.detach offers something of an escape hatch.

I have to say that I have been quite impressed by Lwt. I also found that by exposing Lwt.task and Lwt.wakeup_later to the user, Lwt makes it easy to use it with (for example) node.js’s event loop. A chacun son gout.

Is this something that type algebraic events might make better?

I could be wrong, but it feels to me like either:

  1. We need to give up on backtraces as a form of debugging an application (or at least expect less from them).
  2. Integrate concurrency into the runtime so it can track stacks at an “application” level.

Regardless of if either of those are good ideas, does that analysis seem accurate?

1 Like

I largely agree with this summary of the downsides of monadic concurrency (with the exception that I’m not sure avoiding a second scheduler is possible or desirable. Note that the full Multicore OCaml plans involves a scheduler for moving threadlets around; having an extra scheduler may turn out to be a necessary evil!

My primary hope in this direction is that typed algebraic effects may turn out to be compelling enough, and could provide a good way to merge together the various and sundry approaches to concurrency and IO that exist in the OCaml world today. But it’s going to take a while for that one to land.

y

5 Likes