Buffered IO, bytes vs bigstring

Not sure if this question really makes sense, but here goes.

Jane St. libs use the venerable “bigstring” as a buffer type. Other IO libs use the more recent “bytes”. My guess is that it is not possible to zero-copy coerce between these types. So a choice has to be made as to which to use.

Various factors may weigh on this choice. For example, Jane St. has lots of libs that already use bigstring, so you should use bigstring if you want to take advantage of these libs. On the other hand, there are also lots of libs that use bytes.

My question: is there any intrinsic reason to prefer one or the other?

1 Like

I’m still in my first year of OCaml, so take this with a grain of salt, but I’m leaning towards Bigarray for buffers. As you noticed it’s used in a lot of places for its efficient properties not just for interfacing with C but even in itself for codecs and I/O.

It’s used in its “bigstring” form in several libraries: Core, ctypes, Lwt_bytes, bigstringaf to name a few. It’s also used in the Stdlib itself in Unix.map_file since 4.06 which is great for reading (not writing) files.

Since some libraries use Buffer.t and some use string or bytes, some copying is unavoidable. We can only minimize it.

Note: by “bigstring” I mean any type resolving to (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t

It mostly depends on your allocation policy and the life of your buffers. The question is difficult and it depends on your context but I can list some particularities about both:

  • bigstring/bigarray can not be relocated by the GC. That mostly means that the buffer will never move even if the GC enters into a cycle.
  • Due to the non-relocation of the buffer, we can release the GC lock. This is what happens for digestif which is an implementation of several hash algorithms. We know that these algorithms mostly “calculate”. They don’t do an allocation for instance. So we are able to say that the upcoming computation can be done regardless the GC and in the context of lwt/async (or even multicore), it allows a kind of true parallelism so.
  • Bigarray.sub allocate a “proxy” of the initial bigarray. A sub does not copy the bigarray and gives you a smaller representation which permits an access to a slice of the bigarray. An example is mirage-tcpip which introspects the TCP/IP packet by a succession of sub - which permits a zero-copy between the given packet and the application layer.
    • For this specific aspect, the reality is a bit more complex. Indeed, even if we want to allocate a smaller representation of the given bigarray (a slice), this representation will be allocated into the major heap (but I think it’s not true anymore due to this commit). This is why cstruct appeared as a solution to keep the ability to get some slices from a bigarray and allocate them into the minor heap (which is faster than the major heap). From that, a nice API exists now to manipulate bigarray and take this particular advantage.
  • Specialization on int32 Bigarray and int64 Bigarray is done by the compiler. That mostly means that if you manipulate such values, the compiler is able to avoid an extra allocation on the projection/injection of these values from/to the bigarray. Some calculation can becomes pretty fast instead of a int8 Bigarray with {get,set}_int{32,64} functions to be able to manipulate these values serialized into a certain form (endianness)
  • small bytes (less that Max_young_wosize = 256) are allocated on the minor heap which consists to “just” prepare a new block and shift the pointer of the stop-and-copy minor heap (which is pretty fast)
  • You can take the advantage of Bytes.unsafe_{of,to}_string to manipulate string (and avoid an illegal set via the type system) for free when, on the runtime, string and bytes have the same representation
  • if you want to mmap, you must use a bigarray
  • If you want to manipulate a shared buffer between multiple processes, you must use a bigarray - again, due to the fact that the GC will never move the buffer. This is what I try to do on my side about rowex, a small persistent index.

I think some others particularities exists but again, it really depends on what you want to do. For instance:

  • decompress (an implementation of zlib) uses bigarray because it’s fair to assume that the input buffer and the ouput buffer will have a looong life :slight_smile:.
  • on the opposite, digestif uses both types when it can be interesting to take the advantage about the GC lock (and the ability to release it) and it still is interesting to digest a simple string or small objects (in general).
  • I just start a draft to use bytes instead of cstruct/bigarray in mirage-crypto when I started to check the memory consumption of it which can put a huge pressure on the GC due the allocation via malloc of small objects (2 or 4 bytes).
  • Obviously, a library such as parmap must use bigarray as a shared buffer between processes and do a true parallelism.

Some questions can appear so from all of that:

  • can we functorize the code over a common interface between bytes and bigarray
  • can we use GADT to specialize some branches according to these values
  • should we just be arbitrary on our choice?

I would like to say that, from my experiments, OCaml is not really able to really specialize an implementation which uses a 'buffer via functors or GADTs. I know that you should have a better chance with flambda which is more aggressive than OCaml vanilla. But from my experience, it’s not a reluctant adoption point if you arbitrary choose bytes or bigarray as long as it is consistent with your usage - and this is where it becomes complex to fully describe what you need :slight_smile: .

But in anyway, it’s hard to have the best of both worlds into the same type. Many of these particularities are exclusive due to the underlying design of the caml runtime. So I continue to say that it depends :stuck_out_tongue: .

12 Likes

Thanks @dinosaure for your very detailed answer. You have highlighted and clarified a number of issues for me. As you say, the final answer is “it depends”, but I feel I can make a better choice now. Thanks

1 Like

I was just wondering what would the impact of using something like a LLVM Twine for handling buffers, so instead of allocating a big chunk of memory it would be possible to postpone the allocation and concatenation of the final buffer.

I see a very good use case for receiving network packages, it would be an immutable tree with the chunks being inserted in O(log n) and at the end the chunks could be concatenated to form a larger buffer. I have some intuition - I hadn’t enough time yet to extensively think and test this hypothesis - that it would improve concurrency when dealing with high network speeds.

My overly simplistic answer: if you need buffers that do not move in memory (eg to interface with C libraries), use bigstring, otherwise use bytes. As @dinosaure mentions in his detailed answer, bytes blows bigstring out of the water for small allocations since bigstring is allocated using malloc which is not very efficient.

Cheers,
Nicolas

2 Likes

The problem is also that the stdlib doesn’t provide any IO primitives
that work with bigstrings, besides mmap, does it? So unless you bring in
dependencies it’s not even on the table.

2 Likes

Yes, that’s right. This means that (at the moment) either you write your own C stubs (this is typically not very hard) or you need to depend on a third-party library. There was a PR [RFC] Add Bigstring module by nojb · Pull Request #1961 · ocaml/ocaml · GitHub that would have added Stdlib support for bigstrings, but it didn’t have enough support from the other devs at that time, and it wasn’t merged.

Cheers,
Nicolas

2 Likes

For the MirageOS purpose, it good to limit I/O primitives :slightly_smiling_face:. More generally, I prefer the approach that the user brings by himself what he really needs - and, by this way, give a chance to him to specialize I/O primitives according to the context (s/he/she/ and s/him/her/).

Systematically, you can find arguments to say that current I/O primitives are not fast enough for you - and most of the time, you are right. But I don’t think that users want systematically a fast implementation - at least, I think we mostly want something which works perfectly in any situations (which is harder to get because it’s like a funambulism across many targets which have many specific & exclusive limitations). Coucou Windows.

In my opinion, OCaml, about it’s runtime, has good I/O primitives because they just work without a huge amount of complexity - may be we can improves on them with the type system. But try to find some other primitives which can perfectly fit into some contexts (about performance mostly) is the job of third-part libraries :slightly_smiling_face: - not the job of the compiler/runtime.

1 Like