One thing I’ve wondered off-and-on, is why all the popular web frameworks for OCaml seem to be based on monads. Back in prehistory, I remember that Gerd Stolpmann wrote an Apache plugin that linked OCaml; I remember somebody (Gerd again?) wrote a FastCGI harness for OCaml. These harnesses all assume code is written in direct style, and even if that code is all dead now, it wouldn’t be hard to (for instance) write a FastCGI harness for OCaml, so it could be run from Apache-mod_fastcgi.
I’m curious why nobody who uses OCaml for web-app development has done this. It would make writing the actual “servlets” so, so much simpler.
P.S. I also wrote a “mod_ocaml” for Apache, back in the day, but that code is gone into the Great IBM Graveyard In The Sky. It’s just not that hard to do, is my point.
I agree with you; I am building a file server in OCaml and I spent awhile in analysis paralysis over what concurrency method to use. Then I saw a comment from @Chet_Murthy saying something to the effect of “just use threads!” and I was enlightened .
I think it’s very easy to fall into the trap of trying to build everything at scale, with the fantasy that we’ll be serving 200K requests per second on a resource-constrained system. It’s also easy to assume threads are slow or unusable by extrapolating from caveats about context switches or the GIL, without considering how your application will run. Modern Linux and BSD kernels are really good at running thousands of threads.
It’s much easier for me to start with Threads, writing direct, synchronous code, and then identify the areas that are a performance concern later.
Even if GIL contention became an issue I would consider putting multiple processes behind something like inetd or a forking web server just as readily as I’d consider rewriting everything to use Lwt or Async.
If you ever get to the point of needing scale, then before doing anything to your core code, consider putting an I/O harness in front of your normal threaded system. For most uses (when not streaming large blobs) that’s pretty safe. What do I mean?
make sure that your wire-protocol is easy-to-parse using a non-recursive parser. So HTTP qualifies (with some pain), but even better would be one that either prepends a length, uses length-prepended chunks (of some bounded size, so you can ensure the buffer has enough room), or uses escaping + a terminating sentinel. For example raw JSON would not qualify.
Then when/if you find that you have a gazillion sockets+threads reading/writing, you can replace them with a single thread+epoll, that maintains read- and write-buffers, that it fills up (while reading) or drains (while writing), and this can be tested independently of the rest of the logic of your system.
Critically, in any significant network-oriented I/O-intensive server, managing the -size- of these buffers is important, and centralizing them allows precisely that sort of management. But also [following Mark Hayden’s Ensemble system] you can explicitly manage these buffers, and since they’ll be the most important long-lived data-object in your system, you should be able to leverage that into much better performance (fewer GCs).
And then, if you need to deal with blobs (either reading or writing) you have a natural place to hang callback hooks, which will allow for all other code to be written in direct-style.
All of this should mean that you are able to maintain the # of actual threads, as some small multiple of the actual concurrency of your backend system/store/hardware, and not based on the concurrently-presented workload.
And again: don’t do any of the above (except #1) until you’re actually presented with the problem.
I think it’s probably usable for serious stuff if it’s behind nginx or another reverse proxy. You still need some sort of framework on top as it only provides a basic router abstraction for query handlers.
edit: I forgot to mention the one cool feature: create takes an optional function to make a new thread, called on each query, so you can use a thread-pool or something like that instead of forking a whole new thread.
IME, there isn’t much benefit to direct style in that scenario.
For the most part, writing code in a concurrency monad is adding some >>=, and with the new support for custom let it’s even less different than writing direct code. Some types of APIs need to be different but most of those exist and have similar semantics to the existing APIs but are “just async”.
It’s pretty easy to go from async -> direct if you have a thread API underneath, going the other way tends to be more challenging (depends on the concurrency framework).
As soon as one wants to share data between to threads, life gets a lot harder, even in Ocaml with a GIL.
All-in-all, IMO monads are not more difficult to understand than threads or more difficult to write than threaded code, they also provide a fairly direct path to scaling up.
A little story: I have a friend whom we trained (back at IBM) in a sort of “bootcamp” in how to troubleshoot enterprise web-app dumpster-fires. He finally got exeperienced enough, that he went on his first solo “crit-sit” (critical situation). Afterward, he told me that when he arrived, he asked for a copy of the source-code (this was a J2EE web-app), grepped for the word “synchronized”, and told the customer’s engineers to carefully verify that every instance of synchronized was correctly coded. That is: he put them to verifying that every instance of sharing was properly coded with proper mutex/locking. And then he went for a long coffee-break. Needless to say, they found a bunch of bugs in their handling of shared data at those locations.
It is vanishingly rare that application programmers are able to properly code up shared/multi-thread-accessible data: it is always better to put that data in an external store, even if it’s memcached on the same machine.
To your points: I asked, and different people have different experiences. I will note though, that
there is a recurring post of the form “I don’t understand this LWT thing … help?” where there’s nothing like that for direct style code, because … (next point)
There’s a joke about Paxos: “there are people who think they can implement Paxos, and people who know that they can’t implement Paxos”. In a similar vein [and putting on my transaction-processing hat] it is almost neverevereverever the case that one should allow sharing between threads in “application code”. In a typically multithreaded web-app server, there will be objects that reside in pools accessible by multiple threads: these are typically:
a. the config
b. the network harness
c. various backend connection pools
There is almost never a good argument for application writers to share data between threads except thru external stores. And why? Because (again, this is with my transaction-processing and fault-tolerant systems hat on) you MUST assume that the web-app server address-space is ready-to-crash at any moment. When it crashes, you want the shared data to survive that crash. Furthermore, you want for the access to the data to enjoy some comprehensible serializability slemantics (maybe not full serializability, but still, something comprehensible).
I once asked the guy who convinced Facebook to use PHP, why he chose that language. He told me that he did so because application programmers need a sandbox so restrictive that they will not make mistakes. PHP prevents concurrent requests from sharing data, and wipes all mutable variables clean at the start of each request. I pointed out that he could have gotten something similar with Perl and some coding guidelines; he replied that sure, you could, but it would require too much intelligence on the part of the application programmers.
It is nearly always better to share mutable data thru an external store, than with in-memory sharing, in transaction-processing systems. Those rare instances where it’s worth sharing mutable data in-process are so vanishingly rare as to prove the rule.
It is a very widely used library in the OCaml ecosystem. A lot of people’s first experience with OCaml might be something with Cohttp, or using a database library, or doing something with mirage. It stands to reason that there will be more questions about lwt if newcomers to the ecosystem end up using libraries that in-turn use Lwt
That being said, I totally see your point, and I agree that for a large chunk of applications the built in thread support should be just fine. But from a complexity perspective I wouldn’t classify lwt as a lot more difficult to use/learn than the thread module (but that is a very subjective opinion from my own personal experience in my year of learning OCaml). Apart from that, I use lwt and async for another reason, which is ecosystem support. A lot of libraries I care about use one of these for IO. I also happen to like a lot of functionality provided by lwt and async ship with out of the box that let me work on my problem at hand.
Adding to this, I really like the approach taken by libraries that do provide lwt/async extensions, but are written in a way that still allows for someone to avoid them if needed. httpaf has been one example that i’ve seen recently, and i’ve tried to follow a similar pattern in a WIP postgres client i’ve been experimenting with.
In the fall of 1994, Guy Steele was going around giving this talk about what one might call “syntactic backward-compatibility”. It was about the many examples of “new versions of old languages” that didn’t preserve enough backward-compatibility of the syntax, and thus failed. I have some vague memory of an example from HPF (High Performance Fortran) coming up. His thesis was that a certain level of syntactic familiarity was necessary for programmer adoption, and that language-designers who violated that often regretted it.
Now at the time, he’d just taken a job at SUN Microsystems, for a hush-hush project that he couldn’t discuss. Heh, we know now that that job was to clean up the design of Java (just as he’d done for Scheme, C, C++, maybe other languages).
A year-or-two later, as I saw the thundering herd of C++ programmers, stampeding towards Java, I thought to myself:
“Guy could have just put <<It’s damn curly-braces! The curly-braces!>> on a slide, and left it at that”.
I completely share your preference for OCaml’s syntax, and mystification at others’ balking. [then again, I’m a rabid partisan of Perl’s syntax, so go figger.] But I think Guy was right, and the “surface familiarity” of Java’s syntax to C++ programmers was key to its [initial] success. I’ll note that Golang’s [spit, then spit again] surface similarity to Python’s syntax has a similar effect.
Programmers, esp. the unwashed who make up the vast bulk of our profession, are unscrutable beasts. Ah, well.
But when Eich finally took that fateful position at Netscape the next year, “I was lured with this idea of doing a very-popular-with-academics language called Scheme… The idea was ‘Come and do Scheme in Netscape. Put this programming language into the browser.’” He later calls Scheme “that beautiful research language I was tempted with.” But by the time he’d joined Netscape, they had a deal with Sun Microsystems, which was now pushing their newly-minted language Java. “And suddenly the story was, ‘Well, we don’t know if we want Scheme. We don’t know if we even need a little language like we wanted you to do. Maybe Java’s enough.’”
I’m keen on going even a step further (or if orthogonal, aside): scale towards n=1 and use just CGI, not even a running server and no framework at all. Depends on what you need, but if sufficient simplifies things a LOT. And tends to be overlooked and slowly being forgotten. Nobody dares to advocate it, it seems just too boring.
with TinyHHTP using threads and Opium using LWT. Those with experience in this domain, any criteria when to use what? I find Opium attractive because it is a still small framework that solves common problems.
Addendum: if you are doing anything other than serving static content or simple dynamic content, I believe you would value the services provided by a framework over a pure server:
routing and access to parameters in routes
SSL (a sore point in the OCaml web domain)
DB connection handling
I believe Opium is adding a lot of value and that has nothing to do with threads vs. IO monad.
There’s nothing wrong with forking a fresh process for transactions that are infrequent enough. Nothing at all. I think we’d all be surprised at how much “legacy CGI” is out there in enterprise app deployments, even to this day.
I am probably the one responsible for this trend, together with Jérôme Vouillon
We decided to use cooperative threading in Ocsigen because it makes programming concurrent programs much simpler. You don’t have to worry about shared memory any more. No need for mutexes, no deadlocks … And it’s very efficient.
Jérôme wrote Lwt for this and the use of monads simplifies a lot programming too, especially with the syntax extension.
People massively adopted Lwt for all concurrent applications in OCaml, not only for the Web.
Oh, it’s not your or Jerome’s doing: you built a nice thing, and people used it. That’s great, and I’m sure not going to argue against your work. What I don’t understand, is that for almost all web-apps, there is no value in shared-data, and great danger, too (shared data should be stored in external stores: memcached, rdb, etc). And there were a few abortive attempts at building webserver plugins and such. But they went nowhere. And this is what confuses me: that something so obviously less-accessible would get traction, when the obviously more-accessible, easier-to-understand-because-just-like-all-other-code option never even got started.
But hey, it’s all water under the bridge, and since I no longer work in transaction-processing, I can’t really be bothered to care too much.
Indeed our goal was not to do “just-like-other-code”, as we were paid to do research on Web programming. May be the academic background of many OCaml libraries may be one part of the answer.
However, I really believe that programming with Lwt is easier. Even if you need to learn and understand new concepts in the beginning, you will end up with something a lot more easier to maintain and much more reliable.
In Ocsigen we introduced many other advanced concepts like this: multi-tiers programming, html typing with polymorphic variants and phantom types, service identification mechanism, advanced session management, and even some functional reactive programming.
This requires an initial commitment, but at the end you get a very strong app, easy to maintain, and very quickly developed. This made it possible for us at Be Sport to build a fully functional social network with a very reduced team. Our interns or new engineers need only a few weeks to get fully operational (of course it helps a lot to have skilled developers to speak with every days …).
The initial effort is 1000% worth the effort.
You’ve described a bunch of interesting and valuable features. And you’ve also described how you started off on a base of monadic concurrency. I would like to believe that that’s orthogonal to the question of whether monadic concurrency was helpful, or a hindrance.
BTW, how did you handling scaling? That is, once you reach full utilization of a single core, and you need to scale to multiple cores? And then to multiple machines? Was there in-process read-write shared-state? How did you deal with that across processes/machines?
ETA: A further question (if you’re willing/able to answer): do you have any idea what the maximum presented concurrency for a single process was? Was it due to actual active requests, or reverse-ajax/long-poll-style requests? And if the latter, how did you scale this past a single address-space?
I can answer this one (actually probably a lot of people can, it’s a very standard question in today’s web stacks)–it’s common practice for single-threaded web apps like Node.js, Python, Ruby, etc. to run stateless and spin up multiple instances (one per core) and load-balance among them using a reverse proxy. You can see examples of this in the recent thread on web framework benchmarking.
As for how to deal with state: outsource it to something that knows how to deal with state, like a database (e.g. Postgres) or a cache (e.g. Redis).
EDIT: you also asked about scaling to multiple machines. The answer is very similar. Orchestrate a deploy of the same app to multiple hosts and load balance among all of them.