I’ve read this post just now and it made me wonder how much better/worse we stand next to Go in terms of FFI calls and finalisers. I’ve ported the provided benchmark code to OCaml and did some measurements. I’ll try my best to post code snippets that are as focused as possible, instead of dumping everything here and creating too much noise. But first, numbers!
test | ocaml 5.0 | go 1.19.9 |
---|---|---|
native add | 0.3 ns | 0.3 ns |
foreign add | 1.5 ns | 60.4 ns |
c allocate & free | 4.0 ns | 71.6 ns |
allocate ∘ free | 17.2 ns | 133.3 ns |
alloc auto | 125.6 ns | 1180 ns |
alloc dummy | 131.0 ns | 1197 ns |
alloc custom | 109.4 ns | - |
I benchmarked by passing the functions in question to a bench function which ran them in a loop for _ = 1 to 1M do ignore (f ()) done
between two Unix.gettimeofday
calls, multiplying the result by 1000.0
For foreign/native add
I inlined the addition operation in the loop.
It looks like we’re observing a similar slowdown behavior—an order of magnitude. and also looks like OCaml’s C calls are putting Go’s to complete shame, impressively an order of magnitude faster in all but the native case!
The two approaches used in this blog post were manual memory management and finalisers. I’ve additionally added custom blocks to the mix, since that’s usually something you find used with abstract types e.g. in LLVM bindings.
For Go’s Cstr representation I opted for the following in module Cstr
:
type t = { pointer : ptr } [@@boxed]
and ptr = private int
Then used untagged
annotation for the external
function declarations:
external alloc_ptr : unit -> (ptr [@untagged]) = "" "Alloc"
external free_ptr : (ptr [@untagged]) -> unit = "" "Free"
for the Addition
FFI function, I used untagged
as well as noalloc
. For the finalization hooks, I wrote the following:
let mk_alloc fin () =
let answer = { pointer = alloc_ptr() } in
Gc.finalise fin answer;
answer
let free {pointer} = free_ptr pointer
let alloc = mk_alloc free
let alloc_dummy = mk_alloc ignore
I found no equivalent of Go’s runtime.KeepAlive
and honestly, unless I’m mistaken, it’s probably irrelevant in our case.
Now let’s take a look at the stub… I want to preface this by saying I know I should be using Ctypes and co. for my FFI needs, but I wanted to remain apples-to-apples as much as possible with Go.
For Addition
, I just used int
, the general advice seems to be to always prefer intnat
as to not cause truncation problems, but I wanted to remain as faithful as possible to the Go code, including potential overflows!
int Addition(int a, int b) { return a + b; }
Alloc and Free were void
to char*
and back, I haven’t used CAMLparam*/CAMLreturn*
macros for them because I wasn’t touching OCaml GC. That’s the impression I got from the manual about when to use them.
void Free(char *p) { free(p); }
As for the final custom block approach, I defined the struct relying on the fact that static data is zeroed by default:
static const struct custom_operations custom = {
.identifier = "org.ocaml.discuss.custom",
.finalize = Free_custom,
};
With custom there was no reason to avoid substituting void
and char*
for a proper value
type, and reflecting that on OCaml side:
type custom
external alloc_custom : unit -> custom = "Alloc_custom"
but I’m not really sure if I did it right in the stub:
#define Custom(t, v) (*((t*)Data_custom_val(v)))
void Free_custom(value v) {
free(Custom(char*, v));
}
value Alloc_custom(value _) {
CAMLparam1(_);
CAMLlocal1(v);
v = caml_alloc_custom_mem(&custom, sizeof(char*), BYTES);
Custom(char*, v) = Alloc();
CAMLreturn(v);
}
That’s all! thanks for reading. Please let me know if I made any rookie mistakes with the stubs or benchmark code.