Debug pointer corruption in custom blocks

I’m trying to debug third party C bindings, which fail with an attempt at freeing an invalid pointer. This pointer is initialized using caml_stat_alloc and freed by caml_stat_free. I print this pointer just at allocation and at finalization time, and I obtain the same value. The finalization function is never called in between. The pointer itself is inside a custom block allocated by caml_alloc_final.

Using valgrind/gdb I could confirm the bug was triggered during garbage collection (see stack trace below). I see no reason why the pointer would become invalid since this memory is outside ocaml heap. Also replacing caml_stat_* by malloc and free has exactly no effect.

Despite this very vague description, I was hoping to get some general tips to investigate what’s happening. If I can’t find anything I’ll have a try at ctype but I’d rather not rewrite the bindings myself…

Cheers!

#2  0x00007ffff4aef508 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff4bfa28d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff4af5c1a in malloc_printerr (str=str@entry=0x7ffff4bf843b "free(): invalid pointer") at malloc.c:5341
#4  0x00007ffff4af9b3e in free_check (mem=<optimized out>, caller=<optimized out>) at hooks.c:254
#5  0x0000555555bab776 in caml_empty_minor_heap () at minor_gc.c:388
#6  0x0000555555babbdb in caml_gc_dispatch () at minor_gc.c:446
#7  0x0000555555ba86db in caml_garbage_collection () at signals_asm.c:78
#8  0x0000555555bc243c in caml_call_gc ()

If a pointer is allocated by malloc and the freed once, but it still raises an “invalid pointer”, then it means that you have memory corruption, i.e., the metadata near the pointer by the malloc memory manager was corrupted. This usually happens after a buffer overrun (i.e., indexing error). In other words, the problem lies somewhere far away from OCaml.

Ideally, you should try to reproduce this bug without OCaml involved, then you can easily employ Valgrind. You may also find the mcheck function useful to debug these sorts of bugs.

Thanks @ivg, that makes a lot of sense. I’m going to extract the C part and investigate further.

One question though: in case of buffer overrun, shouldn’t I see Invalid write errors when running with valgrind (even with ocaml involved)?

Yes, you should, however it is not 100% guaranteed, false negatives are possible1. Of course, OCaml with its GC can complicate things for Valgrind, that’s why I’m suggesting to try to reproduce it without OCaml.


1) Also make sure that you’re invoking valgrind correctly and reading throughout the whole log, there should be tons of false positives induced by GC, the fact that you’re not seeing any is a little bit suspicious.

I think I found the issue, thanks a lot for the precious explanations!

1 Like