7 years after: Is OCaml suitable to write networking servers?

I just found the thread from SO:

I was wondering if OCaml will perform well in terms of performance and ease of implementation while dealing with typical client/server interactions over TCP in a multi threaded environment… I mean something really typical like having a thread per client that receives data, operated changes on game states and send them back to clients.

This because I need to write a server for a game and I always did these things in C but since now I know OCaml I was curious to know if it would be ok or I’ll just find myself trying to solve a typical problem in a language that doesn’t fit well that…

It was post 7 years ago, at that time, Pascal Cuoq pointed out the issue of performance and ease of implementation.

I was wondering if there would be some change from that moment?

4 Likes

I’d say, yes and no. To begin with, you wouldn’t use multithreading in OCaml to begin with (at least not yet), because the runtime hasn’t changed, it still is as Pascal describes. But you don’t need to, because you would most likely use a monadic concurrency library like Lwt (which is picking up more and more speed) or Async. I have found that dispatching requests over a message queue with multiple processes (on multiple servers) works rather well in practice and has a number of advantages so spawning as many OCaml processes is a decent option for me.

Personally I would probably not use Jocaml. It is quite a way behind OCaml and the documentation and community is sparse.

4 Likes

Until Multicore OCaml is out, OCaml can’t compete in this space. You could try to use multiple processes (as @Leonidas) says, but it’s unlikely to work out well. For a web server, this could work with effort: have one or more process handle incoming requests and then stuff requests in a locked shared-memory message queue for other processes to handle. It would be extremely awkward to set up though, and the shared-memory area would need to be managed explicitly (with allocating and deallocating). The key point is that there is minimal shared data between processes in this instance, but this is still a pain to set up due to the fact that it all has to be done via the operating system rather than the language.

For a game server, this would be much harder, since presumably there is significant shared state that needs to be accessed by all the processes. It would be painful to set this up in OCaml, since the shared state needs to have some garbage collection method which simply doesn’t exist (except for ocamlnet, which implements its own garbage collector, but is flaky). Ironically, it’s much easier for a reference-counted language such as Python to pull this trick off.

Anyway, the answer is pretty much no, unless you’re good with using just one core, which is very unlikely. On the plus side, your GC latency would be terrific :slight_smile:

I’m surprised by this response. In my experience, OCaml excels at writing various kinds of networking servers, and such servers rarely require shared memory. Indeed, for such servers to scale properly, they have to be able to work without shared memory so you can scale them across multiple physical machines.

Indeed, I think there’s a general feeling that without a multicore GC, you can’t write programs that use multiple cores, which is just silly. There are lots of ways of coordinating and communicating amongst many processes without shared memory, which again is absolutely necessary if you’re going to scale across machines.

To be clear, shared memory is sometimes precisely what you need, and we’ve written serious OCaml programs which rely on shared memory segments (not as part of the heap, but for sharing custom data structures.) But most of the time, various forms of network communication are both sufficient and really are the preferred approach.

y

13 Likes

I want to add some anecdotes to this discussion, about competing in this space. It’s not clear that every actual server application benefits from multithreading.

  • In a low-latency scenario in C++, we once deliberately avoided threads per client, even though, clearly, the language supports them (and we had no cross-platform worries). We had processes per core. If you have a lot of shared state and moving it into a shared memory region is not feasible, you will probably want threads, but per core, not per client.
  • I later saw this video (also from the C++ world):


  • I “heard” Node.js, which basically has almost the same execution model as OCaml with respect to threading, does quite well in this space.

EDIT: And presumably you even need those threads/processes per core only once you find out you are CPU-bound with one thread, or have some truly severe latency requirements, which often turns out not to be the case.

6 Likes

And yet another anecdote, which is that I was recently involved in the conversion of some promise-based (like Lwt or Async) code to use actual threads. This wasn’t due to networking problems, but rather to problems with the lack of non-blocking file I/O APIs on Unix, the access pattern we had, and the inability to run those I/Os quickly with that access pattern without resorting to threads.

When this code was running multiple threads, it had higher latency than a version that started 1 thread, in a low-concurrency scenario. This might be due to the effects described in the linked video, but we didn’t really investigate. We didn’t look at high-concurrency scenarios. Under high concurrency, I would expect threads to benefit latency only up to the number of cores available, and only if they are somehow pinned to cores. Given that the number of cores (fixed, ~10) is often way less than the concurrency level (variable) in many applications, I’d expect this benefit to latency to be insignificant, except in very specific situations.

I’d expect a benefit to throughput, but, again, only if the server is CPU-bound to begin with, and requests don’t interfere with each other too much during processing. The point about not interfering much again suggests that a shared memory region can often work.

I don’t think it’s a coincidence that there isn’t a single OCaml web server able to compete with the main servers out there, despite the fact that we have such products in the ecosystem. In any case, the OP asks about a game server, which requires heavy state sharing – message passing simply isn’t sufficient in most cases.

That’s fine – C++ gives you full flexibility and control, as we all know. Manual memory management/reference counting also means that you can create a shared memory area that contains whatever you want, if that’s what you want (though I don’t know why one would choose that over threading). This is much harder to do in OCaml due to GC requirements.

It doesn’t. It’s also interpreted (with JIT), which doesn’t help, but it’s very limited. I mean, in a world where Ruby and Python can host web services, you kind of give up on speed altogether, but it’s not competing with any of the serious contenders. e.g. here

1 Like

But this is exactly the optimal way to do threading. You get no additional benefit from having number of threads > number of cores. You only get context switch penalties. It’s harder to program this way, but it’s completely doable.

Shared memory is a huge pain in OCaml due to GC. It’s a huge pain in general (and limits portability), but you generally have to settle on a specific data structure and do manual memory management.

1 Like

Multiple reasons, one of them being that with a shared memory region, you opt in to what you want shared. You design a nice API around it, and it becomes a module you can easily reason about. Compared to that, with multiple threads and unrestricted memory sharing, you end up sweating every code review.

That may be true, I haven’t tried it in OCaml. For the benefit of other readers, I think this would typically be done using Ctypes – the actual memory region wouldn’t be in the OCaml heap. However, it may be somehow difficult to keep information about the region or its contents, which information would live in the OCaml heap. I doubt that, though.

I agree that OCaml probably fails hard for game server programming (I also have experience here, but not in OCaml). However, I want to point out that you broadened the discussion to web, and some of the statements made were quite general, so some of the replies were to those statements.

I don’t know what this benchmark shows – did the implementations become CPU-bound? Are the implementations good? If I was making a decision about something in a production scenario, the information I got from that text would not be sufficient. This also applies to the current state of OCaml implementations.

Indeed. However, OP quotes one thread per client in their quoted text. I am suggesting you basically never want that anyway, and I think we are in agreement here.

Well, at Jane Street, we’ve built networked applications that can literally chew through millions of transactions per second, turning transactions around in single-digit microseconds, all in OCaml. I simply don’t think OCaml is the bottleneck here. I think a better explanation of the state of OCaml’s HTTP servers is that only a limited amount of engineering effort has gone into them.

To be clear, I don’t know the game world well, but I think there are many different things that could be called a “game server”, and many of them would work just fine with IPC; and surely some applications that require true shared memory.

i don’t think this is yet a fully competitive HTTP server, but Sprios’ work on HTTP/AF has shown some marked improvements.

Look at the performance graph comparing cohttp with http/af, if you want to see what a difference some engineering elbow grease can make without changing the language or the base execution model, or the concurrency monad you use, for that matter.

y

12 Likes

True, but this only fixes a problem that was unique to cohttp. I get your point about engineering though. It would be nice to see a proper comparison of http/af to other single-core languages.

I’m not saying there’s anything inherent to OCaml that prevents good performance more than any other functional language – if anything, the minimal boxing and single-core incremental GC should help to some degree (though haskell’s optimizations advance rapidly and I’m not sure OCaml has much of a performance advantage there anymore). But notice that the original PR that sparked this repo mentions that they didn’t even bother comparing OCaml to Go or Haskell, since those can take advantage of all cores, which greatly reduces the work needed per core. And while I realize that you could take advantage of more cores with processes, that’s a paradigm that starts off awkwardly in OCaml and quickly becomes untenable the more state you have to share.

All of which goes to say, if I were thinking of writing a competitive web server (which shares minimal state) or a game server (which shares quite a bit of state), would I consider using OCaml? My personal answer is that I would wait for multicore OCaml to do that – the competition is simply too good (Rust, C++17, Haskell, C#, F#), and their advantage stems to a large degree from the ability to easily and effectively utilize multiple cores (ignoring Rust and C++'s added advantage of unboxing and no GC, or C#/F#'s particular advantage of reified generics).

At the same time, one could say that there are a sufficient number of areas for OCaml to compete in without occupying this particular niche. It’s certainly not a coincidence that ReasonML is taking off in the Javascript space.

4 Likes

I’m pretty convinced that the belief that shared memory parallelism matters for servers is a huge fallacy. If you need to really scale, you will exceed the capacity of a single server at some point, and if you do so, you need a load balancer. You might as well start by using a load balancer and multiple processes. Sure, there are cases where it really matters (see Hack’s typechecker) but they are not nearly as common as people make them.

As for video games … either you have lot’s of small instances (individual matches) and you can keep your state on one thread, or you have ReallyBigInstances (MMOs) and you can’t assume one server is enough at all. Most MMOs actually use instances to avoid being in the second situation completely.

While we are on the amusing exercise of pulling examples out of our asses, I also have one: Did you know that Eve Online’s engine (the MMO that regularly make headlines for ridiculous large battles with thousands of players) is completely single threaded? Each solar system can only run in one process. All the magic is in the load balancing, to ensure that systems with high load have one process only for themselves. There are multiple devblogs on the topic.

Maybe other languages are better, maybe they aren’t. OCaml-multicore will be certainly be useful when we got it. But the argument that shared memory is essential for web and game servers still sounds like commonly-propagated nonsense.

10 Likes

It’s all about how much state/data you need to share. If you can shard your data across cores, or even across nodes, you don’t need to worry about shared memory. If you need to heavily share data between multiple cores, not using the memory bus to share them efficiently puts you at a severe disadvantage as compared to other languages, and every option OCaml has right now is a poor band-aid. A seemingly ideal solution would have been a shared-memory area (it works for python and ruby), but a tracing GC doesn’t mesh well with that.

Having built a distributed system in C++ (not specifically a server) where not having shared memory was a bottleneck (due to design issues), I can tell you it’s a very big deal for performance in instances where you can’t shard the data. Also, the reality is far more fluid: what starts out as a perfectly sharded situation easily evolves into one where you need to share data, if only for performance improvement. This is actually true for any application that at some point decides multithreading could be helpful – you start off not knowing if you’ll need multithreading later on. Not having that option in the language is a huge turn-off.

This is interesting. A quick google search seems to indicate they’re really struggling with their server architecture. Parts of their architecture are in stackless python, parts have been converted to multicore, and parts use Infiniband for performance.

2 Likes

I think we should close up this thread. We’ve descended into a conversation about trade-offs in system design. I disagree with your conclusions, but it has nothing to do with OCaml.

The technical questions about OCaml are clear:

  • OCaml has an efficient incremental GC, which is important for handling many clients with low latency
  • OCaml allows the use of shared memory segments for special-purpose shared datastructures, but there’s no shared GC’d heap.

I think we can leave people to their own devices in choosing whether that is a suitable platform for building web-servers or game-servers, or whatever else they want to build. I’m happy to talk about how we built networked services with OCaml, but not on this thread.

y

10 Likes

I’m a bit late in this discussion but I want to give my vision and experience.

In my office we use 2 languages: Python (90%) and OCaml (10%).

(Ok, we also use JS but my mind refuses to consider Javascript a programming language :wink: )

I wrote many network services in OCaml: one in particular runs H24/7 with a huge load and collects critical SCADA data from a network of industrial devices, connected with a Linux box with and old RS485. On theese boxes a client is running, also written in OCaml.

The server and the clients have been written using core/async, so no multithreading and this by a precise design decision, not for a “runtime limitation”.

NGINX, the most popular HTTP server, is essentially a single process.

The lack of “web servers” (what is a web server? HTTP? Application server?) written in OCaml only means that nobody in the OCaml community has time to waste writing a complex piece of software that… already exists: I used Apache for many years and now NGINX and I really don’t feel the need of a “OHTTP”.

You can believe your favourite programming language is the best out there, and still you can refrain from re-writing everything using it, e.g: xmonad, node.js, countless projects written in Go and, of course, J* (J for Java, not Jane Street)

4 Likes

@pdonadeo. Thanks for sharing your experience. Would be interested to know how you are using ocaml behind Nginx. Are you using ocaml as fcgi component of Nginx perhaps?

For what is worth, I use Logarion behind Nginx by proxying Nginx connections to Logarion (which uses opium).

1 Like

In this moment I don’t have websites written in OCaml, if we exclude my dead blog. It’s a very old piece of software written with a micro framework of my own, based on Ocamlnet. In that case the application is connected with the HTTP frontend via FastCGI, a connector provided by Ocamlnet.

AFAIK the way to serve Ocsigen is via reverse proxy, at least in a scenario in which you have several websites/applications hosted on one server with a traditional setup (e.g. no virtual machines).

In my dreams I’d like to see Eliom + FastCGI or Eliom + WSGI or another protocol.

I am not as experience with Ocaml, as the rest of you but I write a lot of servers, so here’s my two cents.

Of course it is, otherwise I wouldn’t use it, and while the fact that shared memory parallelism, isn’t necessary for most problems, you encounter, it’s a lot fucking easier at first to think this way.

For example I was writing a p2p CDN that uses a two hop hierarchical chord variant in Scala, due to the necessity of being able to handle a lot of leaves, and joins, and have as low lookup latency as possible, managing the state of the Routing table was pretty paramount.

So there was a lot of gossip, of both rumor-spreading and anti-entropy varieties, so literally every process needed to access the routing table, constantly, whether the push then pull anti entropy, the push rumor spreading, bootstrapping, routing to replicas, etc, now I am working on successor projects, it is doable in Ocaml ,

1 Like

Do you have an example of that:
“shared memory segments for special-purpose shared datastructures”?
what do you use to do that?
Is it an mmapped bigarray?