Building an OCaml list of string from C

Hi there, I’m currently trying to wrap around my head with C interop while adding features to ocaml-ssl. I’m having trouble that my code is segfaulting.

Disclaimer: I’m a beginner in C so my code would most likely look funny to you, but please bear with me (and possibly suggest improvements) :slight_smile:

Basically, I have this data structure:

const unsigned char *protocol_buffer = {
  2, 'h', '2',
  8, 'h', 't', 't', 'p', '/', '1', '.', '1'
};

And I want to transform this into an OCaml string list of ["h2"; "http/1.1"].

I currently have this code:

static value build_alpn_protocol_list(const unsigned char *protocol_buffer, unsigned int len)
{
  value protocol_list;

  int idx = 0;
  int list_len = 0;
  while (idx < len)
  {
    list_len++;
    int proto_len = (int) protocol_buffer[idx++];
    idx += proto_len;
  }
  protocol_list = caml_alloc(list_len, 0);
  idx = 0;
  int list_idx = 0;
  while (idx < len)
  {
    int proto_len = (int) protocol_buffer[idx++];
    char proto[proto_len + 1];
    int i;
    for (i = 0; i < proto_len; i++)
    {
      proto[i] = (char) protocol_buffer[idx++];
    }
    proto[proto_len] = '\0';
    value p = caml_copy_string(proto);
    Store_field(protocol_list, list_idx++, p);
  }

  return protocol_list;
}

If you notice there’s a value p there, and if I tried to printf("%s\n", String_val(p)) it outputs both h2 and http/1.1 strings correctly inside the loop.

I managed to pass this protocol_list into an OCaml callback, but it segfaulted when I tried to do List.length on it. Assuming the output from ocamldebug is correct, it segfaulted in this line:

Time: 255 - pc: 10252 - module List
20   | _::l -> <|b|>length_aux (len + 1) l

My naive guess is that it is something related to GC on protocol_list. (It might also be related to how I am passing and calling the callback, but I’d stop here first to isolate the problem. If you need me to provide the code that passes the callback do let me know).

Does anyone have any idea how to solve this? Thanks in advance!

One problem right off the bat is that you’re building an array of strings, not a list. A list of strings looks like a Lisp list, i.e., it’s a linked list of nodes of size 2. The first field in each element has the string and the second field contains a reference to the next node in the list. In the last node, the second field is [], which is represented as an immediate 0 value.

You also are not following the protocol for marking your value variables using CAMLlocalN() and returning values with CAMLreturn(). If there were a GC during the allocation of your strings, things could go wrong. But this is unlikely because you’re allocating very little.

Could you please elaborate on that? My understanding is that I’m building a string (char array) proto and trying to put them in in the protocol_list “structured block” (as the manual referred) via Store_field. Prior to doing this, I’ve successfully parsed an OCaml list into an array via calls to Field, so I think I’m doing the opposite now.

Indeed, and I have also tried putting in place of value protocol_list the following:

CAMLparam0();
CAMLlocal1(protocol_list);

and returning with CAMLreturn(protocol_list), but the segfault still happens.

You are making a block with two fields, each of which contains a string. This isn’t what a list looks like in OCaml.

A list looks like I said. In your case it will consist of two blocks with two fields. The two fields of the first block are: “h2” and the second block. The two fields of the second block are “http/1.1” and the immediate value 0.

It’s a linked list, not a vector.

(As a side comment, I had to change const unsigned char *protocol_buffer to const unsigned char protocol_buffer[] to get your code to compile.)

1 Like

Ah, okay, I think I see what you’re referring to. It indeed makes sense now. Let me see if I could do it correctly. And to confirm my understanding, if I change my callback to accept OCaml array instead of list and leave the C function as is, it should work, correct?

I really appreciate the insight.

It would be extremely close to correct. I would have to review the OCaml manual to make sure how arrays are represented. In particular, I don’t remember how the tags are supposed to work. (You are using a tag of 0, which very likely is correct.)

1 Like

I got it working! For the record, here’s what I came up with:

static value build_alpn_protocol_list(const unsigned char *protocol_buffer, unsigned int len)
{
  CAMLparam0();
  CAMLlocal3(protocol_list, current, tail);

  int idx = 0;
  protocol_list = Val_emptylist;

  while (idx < len)
  {
    int proto_len = (int) protocol_buffer[idx++];
    char proto[proto_len + 1];
    int i;
    for (i = 0; i < proto_len; i++)
      proto[i] = (char) protocol_buffer[idx++];
    proto[proto_len] = '\0';

    tail = caml_alloc(2, 0);
    Store_field(tail, 0, caml_copy_string(proto));
    Store_field(tail, 1, Val_emptylist);

    if (protocol_list == Val_emptylist)
      protocol_list = tail;
    else
      Store_field(current, 1, tail);

    current = tail;
  }

  CAMLreturn(protocol_list);
}

Thanks for the help @jeffsco. (Also do tell me if you have any concern with my final solution.)

A simpler solution, while less interesting if your goal is to learn the foreign function interface, would be to return directly protocol_buffer to OCaml as a string and construct your string list in OCaml.

1 Like

That is interesting. Seeing as protocol_buffer is an unsigned char array with length-prefix integers interspersed between the chars, could I just pass it as is to OCaml and treat it as a string?

It should not be a problem. I think to remember that in recent compilers to access unsigned char you need to use Bytes_val instead of String_val but I cannot remember if there was any subtlety on how to return them to ocaml, have a look at the “Interfacing with C” section of the OCaml manual (I am on mobile, I don’t have it at hand rigth now)