TechEmpower benchmarks for opium running httpaf are ready - how to improve the results

used @rbjorklin addition of webmachine (cohttp) to add opium (master commit with httpaf) as well


(I’ve filtered fastify for JS, giraffe for F# and aspcore for C#)

here’s the implementation:

it uses opium (master commit with httpaf), caqti with ppx_rapper, yojson, tyxml

At first glance there doesn’t appear to be anything super wrong with it, but the performance is not really good compared to JS. Am I doing anything wrong, or should I just accept that this is the performance?

Summary (opium is opium running as a single process, opium-haproxy behind a proxy, which is not super fair for the comparison with JS, which is simply forking a process)

JSON serialization:


with a reverse proxy - 2.7x slower than JS forking processes, 5x slower than F# and 8.6x slower than C#

Single query:


with a reverse proxy - 1.3x faster than JS, 1.74x slower than C# Dapper (~OCaml’s Rapper in terms of API) and 1.3x slower than C# with a full ORM - Entity Framework

Multiple queries:


with a reverse proxy - 2x faster than JS, about the same speed as (with Kestrel) C# Dapper and 1.3x faster than C# Entity Framework

Fortunes:
(includes getting data from a DB, adding a record, ordering the new list and then rendering HTML with tyxml)


1.14x slower than JS, 1.7x slower than dapper F# and C#

Data updates:
(not all implementations are equal - some dapper implementations batch the updates, the OCaml and JS do not)


2.6x faster than JS (either JS’s postgres driver is not really good or a reverse proxy helps a lot), 1.1x slower than C# with entity framework orm

Plaintext:


pretty sure there is something wrong, though, not much to implement wrong in these 3 lines of code:

so I will use webmachine for comparison in this test - 4x slower than JS, 20x slower than F#, 10x slower than C#


since the post was long - to ask my question again - even with a reverse proxy, running multiple processes the performance is pretty close to JS;
Am I doing something wrong, how can I get better performance? Or does the better native performance gain disappear when a VM has a chance to optimize code paths?

6 Likes

Regarding your plaintext implementation that kind of looks synchronous to me. I am by no stretch of the imagination well-versed in asynchronous programming so I could be off the mark here but the documentation says:

Lwt.return : 'a → 'a Lwt.t
creates a promise which is already fulfilled with the given value

https://ocsigen.org/lwt/5.2.0/manual/manual

EDIT: Nevermind that was in the response and everything looks good.

@mudrz my guess is that Ocaml can go a lot faster, but the problem is that there’s not a lot of people doing web related work in Ocaml so there’s not a lot of man-hours that went into this area. Play around with db connection pool sizes, try profiling to see where implementation spends the most time and go from there. Maybe it’s issues related to lwt, http/af, opium, yojson or anything else. Maybe someone with production ocaml experience can point something? Anyway i think that we as a community need to consolidate and create a viable web dev ecosystem to attract more people, so more man-hours will be available to grow and enhance the ecosystem as a whole, because most people are doing web related work nowadays.

3 Likes

Huh. Comparing the results before libev and after libev was installed the change for single core webmachine basically falls within the margin of error. This surprises me as I saw a 3x improvement when running the benchmark locally.

EDIT: Nevermind that was in the response and everything looks good.

yep, I’ve been using the let+ syntax extension in the other responses, which wraps the code beneath it in a Lwt.t

my guess is that Ocaml can go a lot faster

I’m hoping that it can, (without too much time :slight_smile: ) because I want to use it

3x improvement when running the benchmark locally.

I really wish we get a 3x improvement due to a simple missing dependency

also created a PR https://github.com/TechEmpower/FrameworkBenchmarks/pull/6085 that removes the dependency from lib and leaves it only in bin, so that it is super easy to copy paste lib/symlink it and add other OCaml frameworks to compare the same code

curious to see how @ulrikstrid’s Morph fares out of the box

1 Like

This data makes no sense for me, in a single core on a Ryzen 3000 I get 3x the performance of NodeJS answering a hello world using the http node module.

const answerHello = (req, res) => res.endsWith("Hello");
http.createServer(answerHello).listen(8080);

Even the opium version talking to a redis on each GET is faster than the hello world in NodeJS.

You can try to add conf-libev, both esy and opam support it

1 Like

When I benchmarked, I got pretty good performance (about 50% as good as F#). I agree that there’s little tuning, so it’s hard to tell what to fix. I’ll note that I also had trouble making sure that libev was working, see here for code that will tell you at runtime if it is.

1 Like

Soon there will be the option to use libuv (see https://github.com/ocsigen/lwt/pull/811) which may also be interesting from this perspective. Differently from libev, luv bundles libuv, so if one uses that engine it is guaranteed that the library is there (if I am not mistaken)

3 Likes

It would be interesting to optimise the database stuff. I think that we’re kind of stuck on the concurrency in plaintext and JSON since we’re not doing any I/O basically so we’re always going to block the single core we have.

re conf-libev:
it is already installed conf-libev here:

just tried starting the opium.dockerfile container with the options printing added:

docker build -t opi -f opium.dockerfile && docker run opi
option fd_passing: true
option fdatasync: true
option get_affinity: true
option get_cpu: true
option get_credentials: true
option libev: true
option madvise: true
option mincore: true
option recv_msg: true
option send_msg: true
option set_affinity: true
option wait4: true

wow using libuv sounds great, this should give some boost to performance

now I wonder how async would fare, is there a server with async support we can test, does JaneStreet have one?

Httpaf has async support. I haven’t used it, but there is a library on opam that provides a higher level interface to it opam - httpaf_caged.1.0.1

At one point I had a need for an async http solution so I had a wrapper around httpaf as well GitHub - anuragsoni/http_async: Asynchronous HTTP 1.1 server for OCaml
I don’t use it anymore but feel free to borrow anything from there.

thanks anuragsoni, looked into Async, the available libraries, documentation and etc., but the effort does not seem to be worth it, it is time better spent somewhere else;

looking forward to the libuv PR though

@blandinw added vanilla httpaf in this PR and it seems to be holding up A LOT better than any other OCaml additions to the benchmark suite. See here for first round that includes httpaf. While waiting for that round to complete you can see the other OCaml web frameworks here.

1 Like

the JSON results look promising, they are similar to F#;
so it’s not that OCaml is slow, but rather that it’s not commercially used for web servers -> there are no optimised libraries;

the plaintext results show that there is an issue with httpaf and plaintext though

Great, I’ll make a post when the benchmark run completes to compare with implementations from other languages and share some learnings and potential next steps.
EDIT the plaintext issue is most likely httpaf#189

1 Like

This seems related to the plaintext benchmarks using http pipelining. This PR Fix unresponsiveness with multiple requests in a single buffer. by mefyl · Pull Request #190 · inhabitedtype/httpaf · GitHub that was merged recently should help httpaf in this benchmark. I tested this PR locally and the results with pipelining look closer to the other frameworks on the benchmark now.

Perhaps someone should tweak the other additions to also spread out across as many processes as there are cores, just like @blandinw did.

As a side note, the fact that such discrepancies exist across implementations make me trust these synthetic benchmarks even less.

2 Likes

Well, these benchmarks are definitely to be taken with a grain of salt! However, if you make sure the optimizations and logic are the same by looking at the code, they can still provide useful insights. They’re also a good reminder of the cost of abstractions/frameworks.

edit: This benchmark in particular is also well-known, so a good way to get new users to try OCaml