Using Ocaml to write network interface drivers

Dmitry_Ponyatov · February 11, 2020, 8:59pm

Is OCaml still have LLVM binding off the shelf?
It can be used to kick up the speed using custom code generation in JIT mode.

XVilka · February 12, 2020, 1:58am

It is - see the corresponding opam package.

Chet_Murthy · February 12, 2020, 3:22am

As an aside, you might be interested/amused to read about the Ensemble network protocol system, written by Mark Hayden at Cornell back in 1997. Back then the native-code compiler was … experimental (IIRC) and he still achieved impressive results. There’s a chapter on the techniques he used, and it’s quite educational – managing memory (not leaving it up to the GC) was very important, and he did a lot of work to make it tractable.

The protocols he implemented were “full virtual synchrony”, which is far more involved than TCP, and his code was fully competitive with an existing C implementation.

gadmm · February 14, 2020, 12:21am

Reference for the curious: Mark Hayden’s PhD thesis, https://ecommons.cornell.edu/handle/1813/7316, Chapter 4.

A very good and interesting read. Thank you @Chet_Murthy.

Given Chet’s claim that it beat an existing C implementation, I was wondering whether it was another one along the lines of the meme “I can beat C with (insert your favourite functional PL) if I restrict myself to this tiny dialect that looks nothing like functional programming!". Not at all! These are all techniques that you have already read about on this discuss. Summary:

Good latency by allocating very little and avoid promoting in the fast path.
To avoid allocating, use inlining to avoid the allocation of closures for higher-order iteration, etc.
Avoid allocations also by preallocating closures outside the fast path.
Use inlining also to remove the cost of abstractions.
Allocate buffers manually outside of the OCaml heap in arenas (in C) and manage them manually with reference-counting.

The last example is less common. Messages have similar lifetimes, so this avoids that an arena is kept alive by a single message. When doing zero-copy of messages, he reports that managing buffers with the GC causes fragmentation issues and 25% of time spent in the GC for moderate workloads. The figure shows that the benefits of doing zero-copy are negated for medium-sized messages. Only manual reference counting keeps a low latency whatever the message size. @kayceesrk mentioned a similar problem to me some time ago but I was missing a good reference until now.

The discussion on the suitability of ML is thorough and well-written.

Hayden mentions that several improvements to OCaml were made as a response to his requests.

Chet_Murthy · February 14, 2020, 2:30am

Yes, #5 was the real killer. Two further thoughts:

(1) Mark went on to rewrite Ensemble in C, and get similar performance. He observed that without having written it in ML, it would have been much, much, much harder to write C-Ensemble.

(2) This issue of “explicitly manage buffers” is a really important one.

For decades (at least, since 1986, when I first noticed), GC jocks have been promising us (in LISP, Scheme, Standard ML, Smalltalk, Java) that the NEXT GREAT ALGORITHM would make it unnecessary to explicitly manage memory (in GCed languages). Literally they’ve been doing this dance for >30 years. And it has NEVER worked-out. [But hope springs eternal … NEXT time …] Why? [rank conjecture here] Machines get faster, memories get bigger, cache-sizes don’t keep up, TLBs don’t keep up. Maybe other things. It’s true that as machines get faster, the class of applications you can solve without “working really hard” grows. But for ultimate peformance, the recipe hasn’t change, and it sure includes managing the memory lifetime of your most important data-structures.

BTW, this isn’t confined to GC & GCed languages. For example, it’s been the case that if you’re really going to do serious transaction-processing, and you care about latency and performance, you’re going to use what amounts to Infiniband (aka “RoCEE”). The (Linux) kernel jocks may be doing incredible things in the TCP stack, but if you really care, you’re going to use a network card that allows you to bypass the kernel entirely, pin physical memory, do everything using polling, and avoid copies wherever possible. And this has been true since Infiniband was invented … for the transaction-processing hardware&software built by Tandem in the mid-80s.

Topic		Replies	Views
Hannes Mehnert interview about MirageOS and OCaml by Evrone Ecosystem irmin , mirageos , announce , dune	15	2832	June 1, 2020
TechEmpower benchmark: httpaf + lwt + unix on par with Haskell's warp Community server , http , benchmark	0	1343	November 17, 2020
OCaml 5 performance Ecosystem multicore , performance , profiling , eio	30	3069	September 11, 2024
Multicore OCaml: August 2020 Community multicore , multicore-monthly	0	2814	September 14, 2020
7 years after: Is OCaml suitable to write networking servers? Community blog , async , server , core	26	16051	December 30, 2018

Using Ocaml to write network interface drivers

Related topics