I’m trying to upgrade the OCaml version we are using to compile my program.
My program initially worked with OCaml 4.14.1.
When I upgraded to OCaml 5.0 everything worked well without modifying the code, but with OCaml 5.1 I get a segfault during the initialization phase of my program. The segfault happens when the caml_darken function of the marjor GC is called.
gdb shows the following backtrace
#0 0x0000555556234c18 in caml_darken (ignored=<optimized out>, v=93825007432256, state=0x5555569ce1a0) at runtime/major_gc.c:1070
#1 caml_darken (state=0x5555569ce1a0, v=<optimized out>, ignored=<optimized out>) at runtime/major_gc.c:1052
#2 0x000055555622c6fb in scan_native_globals (fdata=0x5555569ce1a0, f=0x555556234bc0 <caml_darken>) at runtime/globroots.c:205
#3 caml_scan_global_roots (f=f@entry=0x555556234bc0 <caml_darken>, fdata=fdata@entry=0x5555569ce1a0) at runtime/globroots.c:241
#4 0x00005555562345ea in cycle_all_domains_callback (domain=domain@entry=0x5555569ce1a0, unused=unused@entry=0x0, participating_count=<optimized out>, participating=participating@entry=0x5555569c5f20 <stw_request+64>) at runtime/major_gc.c:1346
#5 0x0000555556224d08 in caml_try_run_on_all_domains_with_spin_work (sync=sync@entry=1, handler=handler@entry=0x555556234490 <cycle_all_domains_callback>, data=data@entry=0x0, leader_setup=leader_setup@entry=0x0, enter_spin_callback=enter_spin_callback@entry=0x0,
enter_spin_data=enter_spin_data@entry=0x0) at runtime/domain.c:1480
#6 0x0000555556224dfd in caml_try_run_on_all_domains (handler=handler@entry=0x555556234490 <cycle_all_domains_callback>, data=data@entry=0x0, leader_setup=leader_setup@entry=0x0) at runtime/domain.c:1502
#7 0x000055555623600e in major_collection_slice (howmuch=<optimized out>, participant_count=participant_count@entry=0, barrier_participants=barrier_participants@entry=0x0, mode=mode@entry=Slice_interruptible) at runtime/major_gc.c:1691
#8 0x00005555562361d6 in caml_major_collection_slice (howmuch=howmuch@entry=-1) at runtime/major_gc.c:1708
#9 0x0000555556224665 in caml_poll_gc_work () at runtime/domain.c:1631
#10 0x000055555623e4bc in caml_do_pending_actions_exn () at runtime/signals.c:308
#11 0x000055555623997d in caml_alloc_small_dispatch (dom_st=0x5555569ce1a0, wosize=<optimized out>, flags=<optimized out>, nallocs=<optimized out>, encoded_alloc_lens=<optimized out>) at runtime/minor_gc.c:816
#12 <signal handler called>
#13 0x0000555555b5e786 in camlKernel__FCSet.Make_1302 () at kernel/FCSet.ml:118
#14 0x0000555555b5decb in camlKernel__FCSet.fun_2297 () at kernel/FCSet.ml:118
#15 0x0000555555b8f4d8 in camlKernel__Datatype.With_collections_8414 () at kernel/datatype.ml:1920
#16 0x0000555555b8fe2f in camlKernel__Datatype.fun_59589 () at kernel/datatype.ml:1966
#17 0x0000555555bd6fda in camlKernel__Cil_datatype.entry () at kernel/cil_datatype.ml:1541
#18 0x0000555555b4b8eb in caml_program ()
#19 <signal handler called>
#20 0x0000555556245176 in caml_startup_common (pooling=<optimized out>, argv=0x7fffffffd6b8) at runtime/startup_nat.c:132
#21 caml_startup_common (argv=0x7fffffffd6b8, pooling=<optimized out>) at runtime/startup_nat.c:88
#22 0x00005555562451ef in caml_startup_exn (argv=<optimized out>) at runtime/startup_nat.c:139
#23 caml_startup (argv=<optimized out>) at runtime/startup_nat.c:144
#24 caml_main (argv=<optimized out>) at runtime/startup_nat.c:151
#25 0x0000555555b4a5b2 in main (argc=<optimized out>, argv=<optimized out>) at runtime/main.c:37
Do you have some ideas to debug the program and identify more clearly the problem, or the value that leads to the segfault?
Do you have C bindings in your code? One thing to watch out for is that OCaml 5 does not support naked pointers. Code using naked pointers in OCaml 4 should be updated for OCaml 5. There is a naked pointer checker available as an opam compiler switch on OCaml 4 to identify naked pointers in your code.
If not, this may be a bug in the compiler. Please can you make an issue on OCaml GitHub repo with a reproduction case.
To add to @kayceesrk’s good suggestion, there is an unknown number (but AFAIR greater than a dozen) of packages on opam that use naked pointers but have been marked as compatible with OCaml 5, and which show up as green in the CI.
I posted below a simple method to see if one idiom which creates naked pointers is present inside your dependencies (adapted for opam, but it adapts to other packaging method):
The usefulness of the compile-time option to detect possible naked pointers at runtime (badly named nnp “checker”) seems relative.
Thank you everyone for your help.
I finally tried to install OCaml 4.14.1 with the nnp-checker. Unfortunately it didn’t find anything when I ran my program on different test cases.
I investigated more by looking at all the occurrences of Obj in my code. When I removed one of its occurences everything worked again with OCaml 5.1. This occurence was Obj.field (Obj.repr 0L) 0.
I do not really understand what is wrong with this code and why it would create a naked pointer. Thus I did some tests to try to understand what can happen in my case. I copy paste the results of my experimentation with OCaml 5.1
OCaml version 5.1.0
Enter #help;; for help.
# Obj.is_block (Obj.repr 0L);;
- : bool = true
# Obj.tag (Obj.repr 0L);;
- : int = 255
# Obj.size (Obj.repr 0L);;
- : int = 2
# Obj.field (Obj.repr 0L) 0;;
Erreur de segmentation (core dumped)
The toplevel results are the same with OCaml 5.0 even if my code doesn’t crash in this case. Did I missed some changes in the representation of the Int64 in Ocaml 5?
Is your program still building correctly with 5.0 - in particular, are the dependencies definitely identical when you compile with 5.1 (same versions of all opam packages). Is the code public?
Unfortunately my code is not public and I am not sure to have enough time to bisect…
The value 0L is an int64 value, which is represented by a Custom_tag object (the tag being 255).
- : int = 255
The first field of the custom tag object is a pointer to a C structure holding the custom operations for that type. See
The code Obj.field (Obj.repr 0L) 0 gets this first field. The result of this expression is indeed a naked pointer (pointer outside of the OCaml heap). The OCaml 5 top-level tries to print it and crashes. OCaml 4 is aware of naked pointers and just prints <abstr> instead.
Not sure what the code Obj.field (Obj.repr 0L) 0 is supposed to do in your original code. Depending on what you were trying to do, there are easy ways around this.