How do I pass an unsigned char * (an array of bytes representing binary data) from C to OCaml?

I have been able to pass a string argument to OCaml side from C side using a callback and caml_alloc_initialized_string as instructed here (OCaml - Interfacing C with OCaml).

Now, I want to add a second argument: an array of bytes representing binary data. Is there a caml_alloc_ function I am missing for a sequence of bytes?

The manual says that string and a sequence of bytes use the same representation. So I tried the following stub:

char * message(char * typ, uint8_t * data)
{
  static const value * closure = NULL;
  if (closure == NULL) {
    closure = caml_named_value("message");
  }
  value msg_type = caml_alloc_initialized_string(strlen(typ), typ);
  value msg_data = caml_alloc_initialized_string(strlen(data), data);
  return strdup(String_val(caml_callback2(*closure, msg_type, msg_data)));
}

but trying to compile it gives me a warning:

warning: passing 'uint8_t *' (aka 'unsigned char *') to parameter of type 'const char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not [-Wpointer-sign]

As the warning says, you are trying to pass an argument of type uint8_t * to a function that is waiting for a value of type char const *. For historical reasons, char and uint8_t are completely unrelated types, though they happen to have the same memory representation in practice. So, you should just cast one pointer type to the other:

value msg_data = caml_alloc_initialized_string(strlen((char *)data), (char *)data);

By the way, you can use caml_copy_string instead of caml_alloc_initialized_string here.

value msg_data = caml_copy_string((char *)data);

Also, you really need to protect your variables of type value with CAMLlocal, as any of the memory allocation might trigger a garbage collection, which would indirectly corrupt their content.

CAMLlocal2(msg_type, msg_data);
...
CAMLreturn(...);

As for the global variable closure, you would have to declare it as garbage collector root. (EDIT: as pointed out by @vrotaru, its content does not need to be protected any further.)

To my understanding this not necessary in the case of values retrieved with caml_named_value.
Callback.register should have registered it as one.

Thank you for the warning. In order to stick to only the sections relevant to my work, I had skipped the sections dealing with garbage collection (#5 and #6).

Thanks for the tip.

That’s correct. I make use of these in the interop.

A sample (which borrows much of the code from section #8, which did not talk about CAMLlocal and CAMLreturn) follows:

(* sample.ml *)

let say_hello name = match name with
| "" -> "Hello, world!"
| v -> "Hello, " ^ v ^ "!" 

let _ = Callback.register "say_hello" say_hello
// samplewrap.c

#include <stdio.h>
#include <string.h>
#include <caml/mlvalues.h>
#include <caml/callback.h>
#include <caml/alloc.h>

char * say_hello(char * s)
{
  static const value * closure = NULL;
  if (closure == NULL) {
    closure = caml_named_value("say_hello");
  }
  value str = caml_copy_string(s);
  return strdup(String_val(caml_callback(*closure, str)));
  /* We copy the C string returned by String_val to the C heap
     so that it remains valid after garbage collection. */
}
// samplewrap.h

char * say_hello(char * name);
// main.c

#include <stdio.h>
#include <caml/callback.h>

extern char * say_hello(char * name);
// extern char * message(char * typ, unsigned char * data);

int main(int argc, char ** argv)
{
  int result;

  caml_startup(argv);
  printf("%s\n", say_hello("Jayesh"));
  // printf("%s\n", message("greet", <bytes_array>));
  return 0;
}

The casting got rid of the warning. I will use the code tonight and report back how it goes.

Besides the stuff @silene already mentioned, you should also use CAMLparam0() and CAMLreturnT macros.
Below is the modified code.

char * message(char * typ, uint8_t * data)
{
  CAMLparam();
  CAMLlocal3(msg_type, msg_data, result);

  static const value * closure = NULL;
  if (closure == NULL) {
    closure = caml_named_value("message");
  }
  msg_type = caml_alloc_initialized_string(strlen(typ), typ);
  msg_data = caml_alloc_initialized_string(strlen(data), data);

  result = caml_callback2(*closure, msg_type, msg_data);
  CAMLreturnT(char *, strdup(String_val(result)));
}

Maybe, result is not necessary. But I included it just to be safe.

Protecting result is not necessary, since there is no point of allocation between its definition and its use. (And protecting msg_data is not necessary either, for the same reason.)

1 Like

Here is my final code:

// bindings.h
char * message(char * typ, unsigned char * data, int data_length);

// bindings.c

char * message(char * typ, unsigned char * data, int data_length)
{
  CAMLparam0();
  CAMLlocal3(msg_type, msg_data, result);

  static const value * closure = NULL;
  if (closure == NULL) {
    closure = caml_named_value("message");
  }
  
  msg_type = caml_copy_string(typ);
  msg_data = caml_alloc_initialized_string(data_length, (char *) data);
  
  result = caml_callback2(*closure, msg_type, msg_data);

  CAMLreturnT (char *, strdup(String_val(result)));
}

I had forgotten to specify length of the data as a parameter. Its needed because msg_data is a plain byte array, not a null-terminated string.

A question:

When should I prefer value type parameters to the normal C types (like the one I used - int, unsigned char * ? value type is what CAMLparam is needed for.

If the C function is called by an OCaml function, then it should take arguments of type value. If the C function is called from another language (including C), it should take arguments suitable for that language.

1 Like