TechEmpower benchmarks for opium running httpaf are ready - how to improve the results

mudrz · October 19, 2020, 3:26pm

used @rbjorklin addition of webmachine (cohttp) to add opium (master commit with httpaf) as well

(I’ve filtered fastify for JS, giraffe for F# and aspcore for C#)

here’s the implementation:

it uses opium (master commit with httpaf), caqti with ppx_rapper, yojson, tyxml

At first glance there doesn’t appear to be anything super wrong with it, but the performance is not really good compared to JS. Am I doing anything wrong, or should I just accept that this is the performance?

Summary (opium is opium running as a single process, opium-haproxy behind a proxy, which is not super fair for the comparison with JS, which is simply forking a process)

JSON serialization:

with a reverse proxy - 2.7x slower than JS forking processes, 5x slower than F# and 8.6x slower than C#

Single query:

with a reverse proxy - 1.3x faster than JS, 1.74x slower than C# Dapper (~OCaml’s Rapper in terms of API) and 1.3x slower than C# with a full ORM - Entity Framework

Multiple queries:

with a reverse proxy - 2x faster than JS, about the same speed as (with Kestrel) C# Dapper and 1.3x faster than C# Entity Framework

Fortunes:
(includes getting data from a DB, adding a record, ordering the new list and then rendering HTML with tyxml)

1.14x slower than JS, 1.7x slower than dapper F# and C#

Data updates:
(not all implementations are equal - some dapper implementations batch the updates, the OCaml and JS do not)

2.6x faster than JS (either JS’s postgres driver is not really good or a reverse proxy helps a lot), 1.1x slower than C# with entity framework orm

Plaintext:

pretty sure there is something wrong, though, not much to implement wrong in these 3 lines of code:

github.com

TechEmpower/FrameworkBenchmarks/blob/2792e5e06aae9002322c0a6f69b2460d3d0b6981/frameworks/OCaml/opium/src/lib/routes.ml#L3-L5


let plaintext () =
  let response = Opium.Std.Response.of_plain_text "Hello, World!" in
  Lwt.return response

so I will use webmachine for comparison in this test - 4x slower than JS, 20x slower than F#, 10x slower than C#

since the post was long - to ask my question again - even with a reverse proxy, running multiple processes the performance is pretty close to JS;
Am I doing something wrong, how can I get better performance? Or does the better native performance gain disappear when a VM has a chance to optimize code paths?

rbjorklin · October 19, 2020, 4:16pm

Regarding your plaintext implementation that kind of looks synchronous to me. I am by no stretch of the imagination well-versed in asynchronous programming so I could be off the mark here but the documentation says:

Lwt.return : 'a → 'a Lwt.t
creates a promise which is already fulfilled with the given value

~~https://ocsigen.org/lwt/5.2.0/manual/manual~~

EDIT: Nevermind that was in the response and everything looks good.

0xa2c2a · October 19, 2020, 4:38pm

@mudrz my guess is that Ocaml can go a lot faster, but the problem is that there’s not a lot of people doing web related work in Ocaml so there’s not a lot of man-hours that went into this area. Play around with db connection pool sizes, try profiling to see where implementation spends the most time and go from there. Maybe it’s issues related to lwt, http/af, opium, yojson or anything else. Maybe someone with production ocaml experience can point something? Anyway i think that we as a community need to consolidate and create a viable web dev ecosystem to attract more people, so more man-hours will be available to grow and enhance the ecosystem as a whole, because most people are doing web related work nowadays.

rbjorklin · October 19, 2020, 5:23pm

Huh. Comparing the results before libev and after libev was installed the change for single core webmachine basically falls within the margin of error. This surprises me as I saw a 3x improvement when running the benchmark locally.

mudrz · October 19, 2020, 5:33pm

EDIT: Nevermind that was in the response and everything looks good.

yep, I’ve been using the let+ syntax extension in the other responses, which wraps the code beneath it in a Lwt.t

my guess is that Ocaml can go a lot faster

I’m hoping that it can, (without too much time ) because I want to use it

3x improvement when running the benchmark locally.

I really wish we get a 3x improvement due to a simple missing dependency

mudrz · October 19, 2020, 5:36pm

also created a PR https://github.com/TechEmpower/FrameworkBenchmarks/pull/6085 that removes the dependency from lib and leaves it only in bin, so that it is super easy to copy paste lib/symlink it and add other OCaml frameworks to compare the same code

curious to see how @ulrikstrid’s Morph fares out of the box

EduardoRFS · October 19, 2020, 8:10pm

This data makes no sense for me, in a single core on a Ryzen 3000 I get 3x the performance of NodeJS answering a hello world using the http node module.

const answerHello = (req, res) => res.endsWith("Hello");
http.createServer(answerHello).listen(8080);

Even the opium version talking to a redis on each GET is faster than the hello world in NodeJS.

EduardoRFS · October 19, 2020, 8:11pm

You can try to add conf-libev, both esy and opam support it

pbiggar · October 19, 2020, 9:31pm

When I benchmarked, I got pretty good performance (about 50% as good as F#). I agree that there’s little tuning, so it’s hard to tell what to fix. I’ll note that I also had trouble making sure that libev was working, see here for code that will tell you at runtime if it is.

mseri · October 19, 2020, 9:43pm

Soon there will be the option to use libuv (see https://github.com/ocsigen/lwt/pull/811) which may also be interesting from this perspective. Differently from libev, luv bundles libuv, so if one uses that engine it is guaranteed that the library is there (if I am not mistaken)

ulrikstrid · October 20, 2020, 12:11pm

It would be interesting to optimise the database stuff. I think that we’re kind of stuck on the concurrency in plaintext and JSON since we’re not doing any I/O basically so we’re always going to block the single core we have.

mudrz · October 20, 2020, 3:46pm

re conf-libev:
it is already installed conf-libev here:

github.com

TechEmpower/FrameworkBenchmarks/blob/2792e5e06aae9002322c0a6f69b2460d3d0b6981/frameworks/OCaml/opium/opium.dockerfile#L9


FROM ocurrent/opam:fedora-32-ocaml-4.11

ENV DIR web
# https://caml.inria.fr/pub/docs/manual-ocaml/libref/Gc.html
# https://linux.die.net/man/1/ocamlrun
# https://blog.janestreet.com/memory-allocator-showdown/
ENV OCAMLRUNPARAM a=2,o=240

RUN sudo dnf install --assumeyes postgresql-devel libev-devel libffi-devel

WORKDIR /${DIR}

COPY src/opi.opam src/Makefile ./

RUN make install-ci

COPY ./src ./

RUN sudo chown -R opam: . && make build

github.com

TechEmpower/FrameworkBenchmarks/blob/2792e5e06aae9002322c0a6f69b2460d3d0b6981/frameworks/OCaml/opium/src/opi.opam#L11


# This file is generated by dune, edit dune-project instead
opam-version: "2.0"
synopsis: "Rudimentary implementation of the Tech Empower Benchmark suite"
maintainer: ["mud"]
authors: ["Robin Bjorklin" "mud"]
license: "MIT"
homepage: "https://github.com/TechEmpower/FrameworkBenchmarks"
bug-reports: "https://github.com/TechEmpower/FrameworkBenchmarks/issues"
depends: [
  "dune" {>= "2.7" & >= "2.7.1"}
  "conf-libev" {>= "4-11"}
  "lwt" {>= "5.3.0"}
  "lwt_ppx" {>= "2.0.1"}
  "opium" {>= "0.18.0"}
  "caqti" {>= "1.2.3"}
  "caqti-lwt" {>= "1.2.0"}
  "caqti-driver-postgresql" {>= "1.2.4"}
  "ppx_rapper" {>= "2.0.0"}
  "yojson" {>= "1.7.0"}
  "ppx_deriving_yojson" {>= "3.5.3"}
  "tyxml" {>= "4.4.0"}

just tried starting the opium.dockerfile container with the options printing added:

docker build -t opi -f opium.dockerfile && docker run opi

option fd_passing: true
option fdatasync: true
option get_affinity: true
option get_cpu: true
option get_credentials: true
option libev: true
option madvise: true
option mincore: true
option recv_msg: true
option send_msg: true
option set_affinity: true
option wait4: true

wow using libuv sounds great, this should give some boost to performance

now I wonder how async would fare, is there a server with async support we can test, does JaneStreet have one?

anuragsoni · October 20, 2020, 4:15pm

Httpaf has async support. I haven’t used it, but there is a library on opam that provides a higher level interface to it opam - httpaf_caged.1.0.1

At one point I had a need for an async http solution so I had a wrapper around httpaf as well GitHub - anuragsoni/http_async: Asynchronous HTTP 1.1 server for OCaml
I don’t use it anymore but feel free to borrow anything from there.

mudrz · October 21, 2020, 2:33pm

thanks anuragsoni, looked into Async, the available libraries, documentation and etc., but the effort does not seem to be worth it, it is time better spent somewhere else;

looking forward to the libuv PR though

rbjorklin · November 15, 2020, 4:20am

@blandinw added vanilla httpaf in this PR and it seems to be holding up A LOT better than any other OCaml additions to the benchmark suite. See here for first round that includes httpaf. While waiting for that round to complete you can see the other OCaml web frameworks here.

mudrz · November 15, 2020, 7:06am

the JSON results look promising, they are similar to F#;
so it’s not that OCaml is slow, but rather that it’s not commercially used for web servers -> there are no optimised libraries;

the plaintext results show that there is an issue with httpaf and plaintext though

blandinw · November 15, 2020, 12:07pm

Great, I’ll make a post when the benchmark run completes to compare with implementations from other languages and share some learnings and potential next steps.
EDIT the plaintext issue is most likely httpaf#189

anuragsoni · November 15, 2020, 9:42pm

This seems related to the plaintext benchmarks using http pipelining. This PR Fix unresponsiveness with multiple requests in a single buffer. by mefyl · Pull Request #190 · inhabitedtype/httpaf · GitHub that was merged recently should help httpaf in this benchmark. I tested this PR locally and the results with pipelining look closer to the other frameworks on the benchmark now.

anmonteiro · November 17, 2020, 7:49pm

Perhaps someone should tweak the other additions to also spread out across as many processes as there are cores, just like @blandinw did.

As a side note, the fact that such discrepancies exist across implementations make me trust these synthetic benchmarks even less.

blandinw · November 17, 2020, 8:43pm

Well, these benchmarks are definitely to be taken with a grain of salt! However, if you make sure the optimizations and logic are the same by looking at the code, they can still provide useful insights. They’re also a good reminder of the cost of abstractions/frameworks.

edit: This benchmark in particular is also well-known, so a good way to get new users to try OCaml

Topic		Replies	Views
OCaml web server run multiple processes Learning web , performance	15	3338	September 17, 2020
Getting OCaml Webmachine onto Tech Empower benchmarks - Caqti losing queries? Learning caqti , webmachine	16	1938	October 19, 2020
Your production web stack in 2020 Ecosystem	37	7395	November 19, 2021
Significant performance difference between OCaml and F# Ecosystem	53	18917	July 9, 2022
TechEmpower benchmark: httpaf + lwt + unix on par with Haskell's warp Community server , http , benchmark	0	1356	November 17, 2020

TechEmpower benchmarks for opium running httpaf are ready - how to improve the results

Related topics