Static CStubs for higher-order functions

Hello,
I’m trying to write bindings for the clang parser and I’m stuck trying to pass a visitor function created from OCaml. The function stub created receives a static_funptr and I’d like to be able to pass an ocaml function.

Specifically I want to bind this function

typedef
  enum CXChildVisitResult
     (*CXCursorVisitor)(CXCursor cursor, CXCursor parent, CXClientData client_data);

unsigned clang_visitChildren(
  CXCursor parent, CXCursorVisitor visitor, CXClientData client_data);

And I would like ocaml users to pass functions of type

Cursor.t -> Cursor.t -> Client_data.t -> Child_visitor.Result.t

The generated stub function needs an input of type (Cursor.t -> Cursor.t -> Client_data.t -> Child_visitor.Result.t) static_funptr which makes sense.
I’m not sure what I’m trying to achieve is doable since ocaml function ponters aren’t… static. But I’m happy with any suggestion on how to proceed forward.

Basically it boils down to: how does one bind higher-order functions? The example from Real World OCaml seems to be dynamically linking…

Ah, from Real World OCaml, there is actually this sentence:

C rarely makes life easier though. There are some definitions that cannot be entirely expressed as static C code (e.g. dynamic function pointers), and those require the use of ctypes-foreign (and libffi ). Using ctypes does make it possible to share the majority of definitions across both linking modes, all while avoiding writing C code directly.

Which I guess answers my question. I’m curious to see what the cost is going to be

If I was writing these binding by hand, I would define a static C function which would be passed to clang_visitChildren; the value representing the higher-order function would be passed simultaneously as client_data; the static function would then call the OCaml function received via the client_data argument using caml_callback*. I guess the same strategy can be used with Ctypes (if Ctypes gives you a facility to implement a static C function; otherwise perhaps a bit of C may be needed).

Incidentally, I guess you have looked at GitHub - thierry-martinez/clangml: OCaml bindings for Clang?

Cheers,
Nicolas

This sounds like a very clever solution, and will probably work! I will try to implement this. I’ll try to understand how to add C files to the dune recipe.

I did take a look at clangml but it does not compile on OCaml 5. I couldn’t find instructions on how to build the project from scratch (only from “bootstrapped versions”, but no information on how to bootstrap). In addition, my needs are much simpler than having to bind the entire clang lib, I just need a parser for C that can do some preprocessing and support partially resolved macros (including #include directives). I do not need the massive AST, and I thought I could implement a thiner layer. That being said, I may not have tried hard enough…

Does it have to be Clang? Several OCaml projects have implemented parsers for C.

It should be possible to implement something equivalent using ctypes’s inverted stubs generation interface which supports building C libraries that expose OCaml functions. There’s an example here, which uses Cstubs_inverted to expose some functions from the OCaml library Xmlm as a C API. I suspect this approach will involve a bit of work on the user’s part, though (and I’m not sure dune supports it yet).

When we measured it in 2017, the overhead for calls made from OCaml to C through libffi was a few hundred nanoseconds per call, while the overhead for static function calls was less than 10 nanoseconds per call. Figures 19 and 20 of A modular foreign function interface have the details.

2 Likes

I couldn’t find a single one that supports non-preprocessed or partially-preprocessed code and is somewhat resilient to syntax errors. I am trying to analyse code while the developer is typing in their editor.
All OCaml projects I have found have no support for that, except for the tree-sitter bindings but I haven’t had luck using existing bindings either… On top of that, clang has the nice feature of performing the preprocessing when calling the parser API…

1 Like

That is really cool! I’ll take a look but that is probably the correct solution :slight_smile:
I’ll give update on this thread for whoever is interested, but I’ll have to start working on that again after POPL…

Looking at what CXClientData is and I see that it is just a typedef void *, Opaque pointer representing client data that will be passed through to various callbacks and visitors.. which is a common pattern for dealing with lack of closures in C

So you can easily pass the actual function in client data and have the following

CXChildVisitResult genericVisitor(CXCursor cursor, CXCursor parent, CXClientData client_data)
{
  CAMLparam0();
  CAMLlocal2(callback, result);
  calback = (value) client_data;
 
  result = caml_callback2(callback, Val_cursor(cursor), Val_cursor(parent));
  CAMLreturn(Int_val(result));
}

And the signature of OCaml visitors can be simplified to

Cursor.t -> Cursor.t -> Child_visitor.Result.t

Yes, I’m currently trying to achieve that, this is what I also got from @nojb’s solution.
Since the rest of my bindings are using ctypes I’m trying @yallop’s approach of using inverted stubs generation, but it looks like dune is not quite made for this to work so I’m fighting a bit, but I’ll get there :slight_smile:

Can you use Root.create from the ctypes?

(* To prevent garbage collection of closure which is passed as callback. *)
let add_visitor parent visitor_name visitor =
  let root = Root.create visitor in
  clang_visitChilldren parent visitor Ctypes.null;
  Root.release root

Update: I ended up writing this binding:

enum CXChildVisitResult clarse_genericVisitor(CXCursor cursor, CXCursor parent, CXClientData client_data)
{
  value v_cursor = ctypes_copy_bytes(&cursor, sizeof(CXCursor));
  value v_parent = ctypes_copy_bytes(&parent, sizeof(CXCursor));

  value callback = *(value *)client_data;
  value result = caml_callback2(callback, v_cursor, v_parent);

  return Long_val(result);
}

My goal is then to bind clang_visitChildren and pass this function, with the visitor passed as client data using Root.create as proposed by @vrotaru.
Unfortunately, I’m sill fighting with dune because it’s not finding my genericVisitor:

libclang__c_cout_generated_functions__Function_description__Functions.c:136:35: error: call to undeclared function 'clarse_genericVisitor'; ISO C99 and later do not support implicit function declarations

I tried adding manual_bindings/stubs.h to the “header” field of ctypes but I get

libclang__c_cout_generated_types.c:2:10: error: 'manual_bindings/stubs.h' file not found with <angled> include; use "quotes" instead
    2 | #include <manual_bindings/stubs.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~
      |          "manual_bindings/stubs.h"

I.e. dune automatically inserts the include using angled brackets and I’m not sure there’s a way to tell it to fetch the header locally… I’ll keep looking tomorrow

(headers
   (preamble
    "#include <clang-c/Index.h>\n#include \"manual_bindings/stubs.h\""))

in the ctypes field did it :slight_smile:

Now it seems that my generic visitor does not pass valid values to the callback

So… I’ve been fighting for two days with dune to find the right formulas to generate both the normal bindings and reverse bindings for a generic visitor.
I ended up writing these dune files: 1, 2, 3, 4, 5.

They are probably very non-optimised as they keep copying things left and right, but at least it compiles.

The important code is:

(* reverse bindings *)
open Ctypes
open Clarse_types
open Foreign

let clarse_generic_visitor cursor parent callback =
  let callback =
    from_voidp
      (funptr (Cursor.t @-> Cursor.t @-> returning Child_visitor.Result.t))
      callback
  in
  !@callback cursor parent

module Stubs (I : Cstubs_inverted.INTERNAL) = struct
  let () = I.structure Clarse_types.Cursor.t

  let () =
    I.internal "clarse_genericVisitor"
      (Cursor.t @-> Cursor.t @-> ptr void @-> returning Child_visitor.Result.t)
      clarse_generic_visitor
end

But then when calling the visitor like this:

let visit (cursor : Types.Cursor.t) (visitor : t) : termination =
  let generic_visitor = get_generic_visitor () in
  let v_root = Ctypes.Root.create visitor in
  let result = clang_visit_children cursor generic_visitor v_root in
  Ctypes.Root.release v_root;
  if result = Unsigned.UInt.zero then Finished else Break

I get

Exception: Ctypes_static.Unsupported "libffi does not support passing arrays".

because I’m still obtaining a function pointer that has cursors in the parameter and libffi doesn’t like that apparently…
@yallop do you know if there is any way out of this?

I’ll try writing the bindings manually again…