Is there a simple library that wraps a C malloc'd char* buffer?

I actually wrote one already: ocaml/buffer.c · master · nbdkit / libnbd · GitLab but I need a similar thing elsewhere and rather than copy the code, I thought I’d try to find if there’s a buffer type already.

Can’t find anything under opam - Packages although I may be searching for the wrong term.

Ideally:

  • Minimal dependencies
  • Provides a bytes-like interface
  • Allows you to interact with it from C, so you can add an existing buffer from another C library to pass back to OCaml code
1 Like

I can’t think of anything off the top of my head. You could use 1-dimension “char” bigarrays, but I guess you already considered this?

Cheers,
Nicolas

It must be backed by a C malloc. Is that true of bigarrays? The intended use for this is wrapping buffers which are allocated in C and passed to OCaml code, replacing the copy here: plugins/ocaml/plugin.c · master · nbdkit / nbdkit · GitLab

You can do this if you allocate the bigarray from C, see OCaml - Interfacing C with OCaml. (Of course, if you allocate it from OCaml, the underlying buffer will be garbage collected).

Cheers,
Nicolas

I may have misunderstood you, but isn’t the buffer still allocated by malloc in that case? In stdlib/bigarray.ml, the create function appears to be a wrapper over caml_ba_create, and that function seems to call up ocaml_ba_alloc. The documentation for the latter states that “[caml_ba_alloc] will allocate a new bigarray object in the heap. If [data] is NULL, the memory for the contents is also allocated (with [malloc]) by [caml_ba_alloc]. [data] cannot point into the OCaml heap.”

The garbage collection of the buffer seems to arise from the finalize function of the relevant custom block, not because the buffer is allocated on the OCaml heap. That’s as far as I understand it. Presumably if you don’t want finalization you could use an abstract block holding a malloc’ed buffer, or a user defined custom block which doesn’t finalize.

So the bigarray approach does seem possible. I will try to prototype something …

Mirage driver libraries do this all the time. You just need to mark the Bigarray as CAML_BA_EXTERNAL (see bigarray.h) and then OCaml won’t mess with the buffer backing the bigarray at all. Obviously, unmapping that memory needs to be done carefully and in cooperation with the OCaml GC, or things end badly.

There is a slight overhead with using Bigarray in this way, since there is a proxy C object allocated per buffer (used by Bigarray to track the slicing and references). That’s why we’ve been experimenting in the background with switching to bytes in OCaml 5 (see ocaml-uring#101). I’ve not had the hacking bandwidth in recent months to pursue this, but it’s on my mind :slight_smile:

Yes, it is, but it is freed when the bigarray itself is garbage collected, which is not typically what you want when you have a pre-existing buffer that has been previously allocated in C.

Cheers,
Nicolas

I implemented this using Bigarray.

Unfortunately it is generating very far from optimal code. I was hoping that copying into the buffer would eventually resolve into rep movsb or at least a simple loop. Instead it ends up calling caml_c_call + caml_ba_set_1. See the disassembly of blit_bytes_to_buf here:

http://oirase.annexia.org/tmp/nbdkit-ocamlexample-plugin.so.txt.gz

Isn’t Bigarray.Array1.unsafe_set supposed to turn into a simple mov?

Try the compiler intrinsics directly as cstruct does?

You need to put type annotations everywhere, otherwise OCaml assumes the
general case. It’ll optimize the array accesses only if it knows the
shape of the array and data.

Yes, that’s a bit better with the annotations. We’re down to just a loop:

   acdaa:       b8 01 00 00 00          mov    $0x1,%eax
   acdaf:       48 8b 1c 24             mov    (%rsp),%rbx
  for i = 0 to len-1 do
   acdb3:       48 83 c3 fe             add    $0xfffffffffffffffe,%rbx
   acdb7:       48 39 d8                cmp    %rbx,%rax
   acdba:       7f 4b                   jg     ace07 <camlNBDKit.blit_bytes_to_buf_81+0xe7>
   acdbc:       48 8b 7c 24 08          mov    0x8(%rsp),%rdi
      (Bytes.unsafe_get src (src_pos+i))
   acdc1:       48 8d 7c 07 ff          lea    -0x1(%rdi,%rax,1),%rdi
   acdc6:       48 d1 ff                sar    $1,%rdi
   acdc9:       48 8b 74 24 10          mov    0x10(%rsp),%rsi
   acdce:       48 0f b6 3c 3e          movzbq (%rsi,%rdi,1),%rdi
   acdd3:       48 8d 7c 3f 01          lea    0x1(%rdi,%rdi,1),%rdi
   acdd8:       48 8b 74 24 18          mov    0x18(%rsp),%rsi
    Bigarray.Array1.unsafe_set buf (buf_pos+i)
   acddd:       48 01 c6                add    %rax,%rsi
   acde0:       48 d1 fe                sar    $1,%rsi
   acde3:       48 8b 54 24 20          mov    0x20(%rsp),%rdx
   acde8:       48 8b 52 08             mov    0x8(%rdx),%rdx
   acdec:       48 d1 ff                sar    $1,%rdi
   acdef:       40 88 7c 32 ff          mov    %dil,-0x1(%rdx,%rsi,1)
   acdf4:       48 89 c7                mov    %rax,%rdi
   acdf7:       48 83 c0 02             add    $0x2,%rax
   acdfb:       48 39 df                cmp    %rbx,%rdi
   acdfe:       74 07                   je     ace07 <camlNBDKit.blit_bytes_to_buf_81+0xe7>
   ace00:       4d 3b 3e                cmp    (%r14),%r15
   ace03:       77 b7                   ja     acdbc <camlNBDKit.blit_bytes_to_buf_81+0x9c>
   ace05:       eb 0b                   jmp    ace12 <camlNBDKit.blit_bytes_to_buf_81+0xf2>
   ace07:       b8 01 00 00 00          mov    $0x1,%eax
   ace0c:       48 83 c4 30             add    $0x30,%rsp
   ace10:       5d                      pop    %rbp
   ace11:       c3                      ret

But I’ll probably just open code these functions in C.

Thanks everyone, upstream here: ocaml: Implement zero-copy pread and pwrite (25ffb743) · Commits · nbdkit / nbdkit · GitLab