Free uninitalized data when calling caml_c_thread_unregister on the main thread

I have an ugly and very difficult to debug crash happening in nbdkit OCaml bindings.

The crash only happens when using glibc’s memory debugging features specifically:

LD_PRELOAD=libc_malloc_debug.so.0 GLIBC_TUNABLES=glibc.malloc.check=1:glibc.malloc.perturb=42

This fills uninitialized memory with repeated patterns, so you get a crash if you try to (in this case) free an uninitialized pointer.

When I do this, some plugins crash in thread_detach_from_runtimecaml_free_signal_stack, apparently freeing the uninitialized th->signal_stack field.

The call trace goes through this function:

I believe what is happening in that close_wrapper function is that we are calling caml_c_thread_unregister on the “main thread” (not sure of the correct terminology) that is allocated by OCaml here:

The question is how I can “know” somehow that this is the main thread and so it shouldn’t be unregistered?

Or maybe caml_c_thread_unregister should be a no-op if called on the main thread?

Or maybe new_thread->signal_stack should be explicitly intialized to NULL rather than being left uninitialized on the main thread?

Applying the patch below fixes the crash, but I’m not sure if it’s the correct way to fix this.

diff --git a/otherlibs/systhreads/st_stubs.c b/otherlibs/systhreads/st_stubs.c
index 5d08ea03c0..e50376cb7d 100644
--- a/otherlibs/systhreads/st_stubs.c
+++ b/otherlibs/systhreads/st_stubs.c
@@ -502,6 +502,7 @@ static void caml_thread_domain_initialize_hook(void)
   new_thread->prev = new_thread;
   new_thread->backtrace_last_exn = Val_unit;
   new_thread->memprof = caml_memprof_main_thread(Caml_state);
+  new_thread->signal_stack = NULL;
 
   st_tls_set(caml_thread_key, new_thread);

I have a minimal test case finally …

hello.ml contains:

let () = prerr_endline "hello"

test.c contains:

#include <caml/callback.h>
#include <caml/threads.h>

int
main (int argc, char *argv[])
{
  caml_startup (argv);
  caml_release_runtime_system ();
  caml_c_thread_register ();
  caml_c_thread_unregister ();
  exit (0);
}

Compile as follows:

$ ocamlopt.opt -g -output-obj -I +unix -I +threads unix.cmxa threads.cmxa hello.ml -o camlcode.o 
$ gcc -g -O0 test.c camlcode.o -I`ocamlc -where` -L`ocamlc -where` -lthreads -lunixnat -lasmrun -lm -o test

Run it using glibc tunables as follows:

$ LD_PRELOAD=libc_malloc_debug.so.0 GLIBC_TUNABLES=glibc.malloc.check=1:glibc.malloc.perturb=42 gdb ./test

It will crash in:

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, 
    signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff7d34793 in __pthread_kill_internal (threadid=<optimized out>, 
    signo=6) at pthread_kill.c:78
#2  0x00007ffff7cdbd0e in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7cc3942 in __GI_abort () at abort.c:79
#4  0x00007ffff7cc47a7 in __libc_message_impl (
    fmt=fmt@entry=0x7ffff7e760a0 "%s") at ../sysdeps/posix/libc_fatal.c:132
#5  0x00007ffff7d279c9 in __GI___libc_fatal (
    message=message@entry=0x7ffff7fa4a71 "free(): invalid pointer")
    at ../sysdeps/posix/libc_fatal.c:141
#6  0x00007ffff7f9dc66 in malloc_printerr (
    str=0x7ffff7fa4a71 "free(): invalid pointer")
    at /usr/src/debug/glibc-2.40-3.fc41.x86_64/malloc/malloc.c:5774
#7  free_check (mem=mem@entry=0xd5d5d5d5d5d5d5d5)
    at /usr/src/debug/glibc-2.40-3.fc41.x86_64/malloc/malloc-check.c:228
#8  0x00007ffff7fa0111 in free_check (mem=0xd5d5d5d5d5d5d5d5)
    at /usr/src/debug/glibc-2.40-3.fc41.x86_64/malloc/malloc-check.c:215
#9  __debug_free (mem=0xd5d5d5d5d5d5d5d5) at malloc-debug.c:208
#10 0x000000000044fc85 in thread_detach_from_runtime ()
#11 0x000000000045004a in caml_c_thread_unregister ()
#12 0x00000000004020d0 in main (argc=1, argv=0x7fffffffe0d8) at test.c:10

To close this out, this was discussed upstream and a solution reached: Uninitialized free in caml_c_thread_unregister if you call it from the main thread · Issue #13400 · ocaml/ocaml · GitHub