Calling ocaml from C has semi-random, silent crashes

I’m encountering some strange problems with interfacing C, Ocaml, and Python.

The overall goal is to use some parsing functions written in ocaml in Python Django, bridging by way of C.

If I startup my django site and go to a page which calls these ocaml functions in order to render, it works perfectly fine as long as I stay on the same page.

If I change pages or wait a few minutes and try to reload, the server will silently crash with no error messages.

Earlier I would get the “No domain lock” crash error under the same circumstances which according to the ocaml c interfacing documentation is something that comes up in a multithreading context – which I’m not using in either the python, c, or ocaml levels. I added manually acquiring the runtime lock to the code and that error message went away, but the behavior remains essentially the same.

I’m thinking I must be screwing something up with the Ocaml garbage collector or making a C error unwittingly. I also am wondering if the ocaml runtime might be silently getting de-initialized somehow.

Here’s the C code:


#include <assert.h>
#include <caml/alloc.h>
#include <caml/callback.h>
#include <caml/fail.h>
#include <caml/memory.h>
#include <caml/misc.h>
#include <caml/mlvalues.h>
#include <caml/osdeps.h>
#include <caml/threads.h>
#include <caml/unixsupport.h>
#include <stdlib.h>
#include <string.h>

void __caml_init() {

  static int once = 0;

  if (once == 0) {
    char *argv[] = {"ocaml_startup", NULL};

    caml_startup(argv);

    once = 1;
  }
}

char *parse_lml_intern(char *str_in) {
  CAMLparam0();

  static const value *_parse_lml = NULL;
  if (_parse_lml == NULL)
    _parse_lml = caml_named_value("parse_lml");

  value ocaml_str_in = caml_copy_string(str_in);

  CAMLlocal1(result);
  result = caml_callback_exn(*_parse_lml, ocaml_str_in);
  assert(Tag_val(result) == String_tag);

  size_t result_len = caml_string_length(result);
  char *str_out = malloc(result_len);
  memcpy(str_out, String_val(result), result_len);
  str_out[result_len] = '\0';
  CAMLreturnT(char *, str_out);
}

char *parse_lml(char *str_in) {
  __caml_init();

  if (Caml_state_opt == NULL)
    caml_acquire_runtime_system();
  char *out = parse_lml_intern(str_in);

  caml_release_runtime_system();
  return out;
}

Relevant Ocaml code:

let parse_lml_c lml =
  try
    Parse.parse_lml lml
  with
    Document.SyntaxError msg ->
     Printf.sprintf "Error: %s" msg
  | _ -> ""

let () = Callback.register "parse_lml" parse_lml_c

Python code:


def render_lml(lml, template):
    lml = bytes(lml, "utf-8")
    parsed_lml = lib.parse_lml(lml)
    print(ffi.string(parsed_lml))
    result = ffi.string(parsed_lml).decode("utf-8")
    # Need to explicitly free the c string now that we are done with it
    lib.free(parsed_lml)
    return result

I think I figured this out – the issue is with the bare value ocaml_str_in declaration – this value needs to be declared with CAMLlocal1.

I didn’t use that pattern because I was naively copying the examples from this otherwise good tutorial here: Calling OCaml from C

Figured out the correct pattern thanks to this blog although because this error is subtle it took me a dozen read throughs to notice which mistake I was making Easy mistakes when writing OCaml C bindings - Brendan Long

Since you mention that your real goal is calling OCaml from Python, I wondered if you’d tried any of the OCaml<->Python interfaces that others have written? Just thought I’d mention it. Always better to use somebody else’s code, rather than figure out how to write it oneself (unless one has a good reason for doing so, of course grin)

I found that the existing interfaces for calling Ocaml from Python were incompatible with current versions of Ocaml. Previously attempting adapt these proved harder than just writing the bindings myself since they were quite convoluted.

That seems dubious. Indeed, the value ocaml_str_in is consumed as soon as it is produced, so the garbage does not get a chance to mess with it. Thus, whichever bug there is in your code, your change is not fixing it, it is just making it less likely to crash.

This code is buggy: there isn’t enough space for the final NULL; the argument to malloc should have size result_len+1.

Cheers,
Nicolas

Before we consume ocaml_str_in we’re creating another ocaml value with CAMLlocal1(result). I think this gives the GC a chance to run. If that’s not the case, then maybe it was fixing the length bug that @nojb pointed out.

In any case, according to the manual best practice is to always register value declarations with the macros.

I think this gives the GC a chance to run. If that’s not the case, then maybe it was fixing the length bug that nojb pointed out.

No, CAMLlocal does not run the garbage collector. As noticed by @nojb, your length bug is most certainly the cause of the crashes, since your code indeed overwrites some random location in memory.

I stand corrected then