Using Lwt to interleave reads and writes

I’m looking for advice how best to implement a small tool that copies data from one file descriptor to another using Lwt on Unix to interleave read and write operations. This needs to work for file descriptors where the size of the data is not known in advance. I would like to optimise throughput by carrying out read operations while writing data - so just repeatedly reading a small block and then writing it is not what I am after.

  • I could implement a ring buffer using non-blocking IO that is shared between a reader and writer thread. A possible advantage is that memory requirements are explicit but requires handling offsets.
  • I could use a bounded Lwt_stream to pass buffers of 64k (or so) between a reader and a writer thread. This is more abstract but one might have to be careful about creating lots of short-lived buffers.
  • Your idea here.
2 Likes

If it’s on Linux, then it sounds like you could just use a binding to sendfile(2) to splice the two fds together:

http://man7.org/linux/man-pages/man2/sendfile64.2.html

The problem with that is, according to my colleagues, that this does not work when you don’t know the size in advance. We tried it with large blocks and had problems with missing errors. But this was my initial reaction, too. I’m not ruling out a problem in our code. Now I think maybe connect two processes with a pipe to decouple them.

Hello there. I have almost no experience with Lwt, so excuse me if my comment is not relevant, but looking at it from a higher level point of view, maybe you could solve this by considering it as an instance of the Producer-Consumer problem.

Using LWT for concurrency probably works for things like sockets; for files on spinning disks, it’s a little different., I fear. To wit, in order to get full throughput from a disk with movable head that has an elevator algorithm, you need to present enough IOPS to fill its queue. That’s very different than writing a block every time it’s ready-to-write. And the same is probably true on reading. Two different friends who wrote SAN systems, in C++ and Ocaml, both gave me the same solution: create a pool of C threads (real, kernel threads) that perform the I/O operations, and communicate with a main thread via either mutex/condvar/semaphores, or (better) loopback socketpairs.

Actually expressing your intentions to the hardware seems to be really important for disk drives. And this remains true, even after you factor in kernel-buffering. Because eventually you fill your kernel buffers. Or, heck, you might want to disable kernel-buffering,b/c that’s more copying, which gets in the way of performance.

I don’t know what the story is for SSDs, though. Everything I wrote above is for rotating disks with seeking heads.

If -all- you want to do is copy, probably forking processes to deal with reading and writing is going to be simpler than what I describe above.

1 Like

It’s worth digging in here to find out precisely what the problem was, since it’s difficult to beat a kernel-spliced socket in userspace. It could be that you were trying to go from a socket->socket. sendfile(2) has a few restrictions and has a reputation for being unreliable under some circumstances. You could also try splice(2) and its associated functions directly: splice(2) - Linux manual page

If all that fails, then there are examples of ring buffers using the Xen device protocol at GitHub - mirage/shared-memory-ring: Xen-style shared memory rings, or a disk-persistent version (probably too conservative for your usecase) at GitHub - mirage/shared-block-ring: A simple on-disk fixed length queue. Both will have examples of Lwt patterns for interleaving.

1 Like