This is a good question, and it’s helpful to understand what each datastructure is backed by, and what operations are inefficient.
Bigarray
is a pointer to externally allocated memory of arbitrary length. It supports creating smaller views of the same memory without copying it, which is implemented at the OCaml runtime level. Accessing data within bigarrays is fast thanks to some compiler primitives which allow for endian-neutral parsing and serialisation, implemented by ocplib-endian.- Bigarrays are extremely convenient for network IO, since they support everything needed for minimal copying of data from the OS. You can exchange memory pages directly from the OS into the OCaml heap, and process them. Unfortunately, one operation is critically slow here – creating a substring. Bigarray’s provenance was originally to interop better with Fortran-style HPC code, where the size and dimensionality of arrays is generally large. For IO, we just want really speedy 1-dimension arrays, and in this usecase Bigarray substring creation is very slow due to the underlying reference counting. Thus cstruct was born, which keep a single underlying Bigarray structure and allocates small OCaml records on the minor heap for subviews. These are cheap to create and GC, and the underlying data is not copied unless requested.
String
s are immutable and sit in the OCaml heap, and require a data copy from the outside world into them. Under some circumstances (usually small allocations) they can be more performant.Buffer
s are a resizable String, and efficient if you need to concatenate lots of data of unknown size.
So the final answer, as with many systems performance problems about what is “efficient” depends on your allocation patterns. For transmitting data, there is often a number of small pieces of data that are combined onto a set of pages for the write path. In this case, a hybrid of “in-heap” assembly using small strings followed by blitting into a Bigarray is reasonable. For reading, parsing directly from a Bigarray into a cstruct works well.
This is basically all incorrect; please see above. Accessing Bigarrays can be done via builtin compiler primitives that make it fast. And the point of using them is to avoid multiple small allocations, especially on the read path.
The basic approach to low latency OCaml hasn’t really changed much in the last few decades. You just need to minimise allocation to maximise GC throughput, and OCaml makes it fairly easy to write that sort of low level code. Two papers that might be helpful:
- “Melange: Towards a functional internet”, EuroSys 2007. Contains a latency analysis of an SSH and DNS server vs C equivalents, and some techniques on writing low-latency protocol parsers. These days, we do roughly the same thing with ppx’s and cstructs, without the DSL in the way.
- “Jitsu: Just-in-Time Summoning of Unikernel;s”, NSDI 2015. This shows the benefits of whole-system latency control – you can mask latency by doing some operations concurrently, which is easy to do in unikernels and hard in a conventional OS.
We’ve never really built systems in the “soft realtime” sense so far – for example no video transmission system or isochronous Bluetooth implementations. Internet protocols are very resilient to variable latency, although of course we want to keep things as low as possible. I’ve been looking into multipath multicast video transmission in Mirage recently due to the current work-at-home situation, so that might change soon depending on how it goes
One thing that has changed in the past decade is the steadily improving latency profile of the OCaml GC, which has only been improving thanks to @damiendoligez’s steady work. That has let us get away with not directly addressing latency much in Mirage itself, as every upgrade of the compiler is a pleasant improvement.
And indeed, @xavierleroy is right that we allocate like crazy, with the caveat that this only really happens on the transmission path of most protocols. Reads tend to go through a more minimal copy discipline.
We certainly do care about this, but it has to be fixed upstream in OCaml as we have reached the limits of what we can practically do with Bigarray – I am hoping that multicore OCaml is the perfect time to unify all these IO approaches in that direction as part of that effort. Mirage will benefit from whatever happens there eventually.