Foreign function overhead

I’m trying to call a shared library using ctypes Dl module and I found the function result of Ctypes.coerce introduces 150-160ns call overhead. For reference doing the same using C stub is 8ns in the same benchmark.

let dl_bench =
  let handle =
      ~flags:[ RTLD_LAZY ]
  let sym = Dl.dlsym ~handle ~symbol:"plus_one" in
  let typ = Foreign.funptr Ctypes.(int @-> returning int) in
  let dlpo = Ctypes.(coerce (ptr void) typ (ptr_of_raw_address sym)) in
  assert (dlpo 4 = 5);
  Bench.Test.create ~name:"dlsym" (fun _ -> dlpo 4 |> ignore)

I’m having hard time understanding what Ctypes.coerce actually does and how I could improve the performance.

Is there any way I could call a function loaded using Dl.dlsym with lower overhead?

Note: I assume the overhead is not from Dl.dlsym but from coerce itself.

Edit: Nearly the same overhead applies when using ctypes foreign interface.
Edit: On the other hand - it might be Foreign.funptr that causes the overhead.
Edit: I think indeed I now understand, the generic FFI C stub might be the cause of the overhead. I think I’ll opt for a less generic one since I don’t need it. I’m gonna leave this thread hanging since someone might stumble on the same issue.

This overhead is because the Foreign module in ctypes goes through the libffi dynamic invocation layer. If you want performance closer to writing a C stub, then you should be using the Ctypes C stub generation mode. This requires a bit more build system integration but you can find an example at GitHub - avsm/ocaml-yaml: OCaml interface to the YAML 1.1 spec.


Another example in directories, that’s the exact commit where we switched from libffi to stub gen.