Debugging segmentation faults

I’ve just started porting some code to OCaml and have written ~1kLOC of vanilla OCaml code. The only things I’m using are PPX deriving show and ord. When I run my little program I get:

./run.sh: line 1:  1896 Segmentation fault: 11

I’m on an M2 Mac so I fire up lldb and it says:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x9)
    frame #0: 0x000000010001169c w.exe`camlDune__exe__P__anon_fn$5bp$2eml$3a35$2c50$2d$2d75$5d_522 + 132
w.exe`camlDune__exe__P__anon_fn$5bp$2eml$3a35$2c50$2d$2d75$5d_522:
->  0x10001169c <+132>: ldr    x1, [x22, #0x8]
    0x1000116a0 <+136>: ldr    x0, [x22]
    0x1000116a4 <+140>: bl     0x100011588               ; camlDune__exe__P__anon_fn$5bp$2eml$3a35$2c51$2d$2d68$5d_599
    0x1000116a8 <+144>: orr    x0, xzr, #0x3

I’m struggling to read that but it looks like it is loading some data from an invalid location. Just in case it is a stack overflow I tried a stack trace:

(lldb) thread backtrace
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x9)
  * frame #0: 0x000000010001169c w.exe`camlDune__exe__P__anon_fn$5bp$2eml$3a35$2c50$2d$2d75$5d_522 + 132
    frame #1: 0x000000010016fc40 w.exe`camlStdlib__Array__fold_left_625 + 136
    frame #2: 0x0000000100010398 w.exe`camlDune__exe__P__anon_fn_496 + 128
    frame #3: 0x00000001001ab674 w.exe`camlStdlib__Format__k_7979 + 28
    frame #4: 0x0000000100010918 w.exe`camlDune__exe__P__parse_453 + 16
    frame #5: 0x000000010019bcc0 w.exe`camlStdlib__Fun__protect_85 + 96
    frame #6: 0x00000001000081d8 w.exe`camlDune__exe__Language__check_141 + 80
    frame #7: 0x0000000100006bf4 w.exe`camlDune__exe__Wiki__code_begin + 188
    frame #8: 0x00000001000034dc w.exe`caml_program + 4716
    frame #9: 0x00000001001feb0c w.exe`caml_start_program + 104
    frame #10: 0x00000001001d9058 w.exe`caml_startup_common(argv=0x00000001082562a8, pooling=<unavailable>) at startup_nat.c:160:9 [opt]
    frame #11: 0x00000001001d90cc w.exe`caml_main [inlined] caml_startup_exn(argv=<unavailable>) at startup_nat.c:167:10 [opt]
    frame #12: 0x00000001001d90c4 w.exe`caml_main [inlined] caml_startup(argv=<unavailable>) at startup_nat.c:172:15 [opt]
    frame #13: 0x00000001001d90c4 w.exe`caml_main(argv=<unavailable>) at startup_nat.c:179:3 [opt]
    frame #14: 0x00000001001d912c w.exe`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3 [opt]
    frame #15: 0x00000001b1657e50 dyld`start + 2544

Anyone know what’s going on or how best to proceed?

I’m using OCaml 4.14.1 BTW.

This doesn’t look like something we know about, so I would advise submitting an issue. If the bug is reproducible from public code, reproduction steps would help. It would also be nice to know if you get the same error with the same code on another computer, or if it’s specific to your setup in some way.
The backtrace shows that the error is in the body of an anonymous function at line 35 in p.ml, characters 50 to 75; knowing what this function looks like would help. If it’s not too big, the full assembly code for this function would be interesting to know.
Finally, please include the configuration of your compiler. If you’re using an opam switch, which compiler package you have installed should be enough (the vanilla ocaml-base-compiler, or ocaml-system, or ocaml-variants; in the last case the exact version is important, as well as any ocaml-option-* package you might have installed). If you’re using something else, the output of ocamlopt -config would be nice.

1 Like

Vanilla OCaml code: there are few known bugs. You can start by looking if your programs falls into one of these cases:

an anonymous function at line 35 in p.ml , characters 50 to 75; knowing what this function looks like would help.

Thanks. The function appears to be PPX generated:

[%derive.show: (L.token * Ast.pos) array]

However, if I comment out that (debug printf) line the program just crashes elsewhere:

(lldb) run
Process 4994 launched: '...' (arm64)
Process 4994 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1)
    frame #0: 0x0000000100011828 w.exe`camlDune__exe__P__pChoose_479 + 64
w.exe`camlDune__exe__P__pChoose_479:
->  0x100011828 <+64>: ldr    x0, [x8]
    0x10001182c <+68>: ldr    x1, [sp]
    0x100011830 <+72>: ldr    x10, [x1]
    0x100011834 <+76>: blr    x10
Target 0: (w.exe) stopped.
(lldb) thread backtrace
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1)
  * frame #0: 0x0000000100011828 w.exe`camlDune__exe__P__pChoose_479 + 64
    frame #1: 0x0000000100011938 w.exe`camlDune__exe__P__pOpt_523 + 32
    frame #2: 0x0000000100017b04 w.exe`camlDune__exe__L__apply_1481 + 44
    frame #3: 0x0000000100017b04 w.exe`camlDune__exe__L__apply_1481 + 44
    frame #4: 0x0000000100017b04 w.exe`camlDune__exe__L__apply_1481 + 44
    frame #5: 0x0000000100017b04 w.exe`camlDune__exe__L__apply_1481 + 44
    frame #6: 0x00000001000120f8 w.exe`camlDune__exe__P__pAlt_757 + 104
    frame #7: 0x00000001000163f8 w.exe`camlDune__exe__P__parseStatements_2818 + 24
    frame #8: 0x0000000100011514 w.exe`camlDune__exe__P__parse_453 + 2556
    frame #9: 0x000000010019bd60 w.exe`camlStdlib__Fun__protect_85 + 96
    frame #10: 0x00000001000084d8 w.exe`camlDune__exe__Language__check_141 + 80
    frame #11: 0x0000000100006ef4 w.exe`camlDune__exe__Wiki__code_begin + 188
    frame #12: 0x00000001000037dc w.exe`caml_program + 4716
    frame #13: 0x00000001001febac w.exe`caml_start_program + 104
    frame #14: 0x00000001001d90f8 w.exe`caml_startup_common(argv=0x00000001001a13a0, pooling=<unavailable>) at startup_nat.c:160:9 [opt]
    frame #15: 0x00000001001d916c w.exe`caml_main [inlined] caml_startup_exn(argv=<unavailable>) at startup_nat.c:167:10 [opt]
    frame #16: 0x00000001001d9164 w.exe`caml_main [inlined] caml_startup(argv=<unavailable>) at startup_nat.c:172:15 [opt]
    frame #17: 0x00000001001d9164 w.exe`caml_main(argv=<unavailable>) at startup_nat.c:179:3 [opt]
    frame #18: 0x00000001001d91cc w.exe`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3 [opt]
    frame #19: 0x00000001b1657e50 dyld`start + 2544

Finally, please include the configuration of your compiler. If you’re using an opam switch, which compiler package you have installed should be enough (the vanilla ocaml-base-compiler , or ocaml-system , or ocaml-variants ; in the last case the exact version is important, as well as any ocaml-option-* package you might have installed).

opam switch list-available seems to highlight 4.14.1+options. I have the following installed:

ocaml-option-address-sanitizer         1                                  Set OCaml to be compiled with address sanitizer
ocaml-option-afl                       1                                  Set OCaml to be compiled with afl-fuzz instrumentation
ocaml-option-bytecode-only             1                                  Compile OCaml without the native-code compiler
ocaml-option-default-unsafe-string     1                                  Set OCaml to be compiled without safe strings by default
ocaml-option-flambda                   1                                  Set OCaml to be compiled with flambda activated
ocaml-option-musl                      1                                  Set OCaml to be compiled with musl-gcc
ocaml-option-nnp                       1                                  Set OCaml to be compiled with --disable-naked-pointers
ocaml-option-nnpchecker                1                                  Set OCaml to be compiled with --enable-naked-pointers-checker
ocaml-option-no-flat-float-array       1                                  Set OCaml to be compiled with --disable-flat-float-array
ocaml-option-spacetime                 1                                  Set OCaml to be compiled with spacetime activated
ocaml-option-static                    1                                  Set OCaml to be compiled with musl-gcc -static
ocaml-options-only-afl                 1                                  Ensure that OCaml is compiled with AFL support enabled, and no other custom options
ocaml-options-only-flambda             1                                  Ensure that OCaml is compiled with flambda activated, and no other custom options
ocaml-options-only-flambda-fp          1                                  Ensure that OCaml is compiled with flambda and frame-pointer enabled, and no other custom options
ocaml-options-only-fp                  1                                  Ensure that OCaml is compiled with only frame-pointer enabled, and no other custom options
ocaml-options-only-nnp                 1                                  Ensure that OCaml is compiled with no-naked-pointers, and no other custom options
ocaml-options-only-nnpchecker          1                                  Ensure that OCaml is compiled with enable-naked-pointers-checker, and no other custom options
ocaml-options-only-no-flat-float-array 1                                  Ensure that OCaml is compiled with no-flat-float-array, and no other custom options
ocaml-options-vanilla                  1                                  Ensure that OCaml is compiled with no special options enabled

I’ll try 4.14.1

That’s the list of all available options, not only the ones you have installed. You can use opam list ocaml-option-* to see which ones are installed and which are not.

(Or opam list --installed ocaml-option-* to only see the ones that are installed)

Sounds like memory is being overwritten. Are you using unsafe_set anywhere?

% opam list "ocaml-option-*"
# Packages matching: name-match(ocaml-option-*) & (installed | available)
# Name                             # Installed # Synopsis
ocaml-option-address-sanitizer     --          Set OCaml to be compiled with address sanitizer
ocaml-option-afl                   --          Set OCaml to be compiled with afl-fuzz instrumentation
ocaml-option-bytecode-only         --          Compile OCaml without the native-code compiler
ocaml-option-default-unsafe-string --          Set OCaml to be compiled without safe strings by default
ocaml-option-flambda               --          Set OCaml to be compiled with flambda activated
ocaml-option-musl                  --          Set OCaml to be compiled with musl-gcc
ocaml-option-nnp                   --          Set OCaml to be compiled with --disable-naked-pointers
ocaml-option-nnpchecker            --          Set OCaml to be compiled with --enable-naked-pointers-checker
ocaml-option-no-flat-float-array   --          Set OCaml to be compiled with --disable-flat-float-array
ocaml-option-spacetime             --          Set OCaml to be compiled with spacetime activated
ocaml-option-static                --          Set OCaml to be compiled with musl-gcc -static

I’ve installed vanilla 4.14.1 and I still get a similar error, albeit in yet another function (caml_tuplify2).

Here is the output from:

% ocamlopt -config    
version: 4.14.1    
standard_library_default: ~/.opam/4.14.1/lib/ocaml    
standard_library: ~/.opam/4.14.1/lib/ocaml    
ccomp_type: cc    
c_compiler: cc    
ocamlc_cflags: -O2 -fno-strict-aliasing -fwrapv -pthread    
ocamlc_cppflags: -D_FILE_OFFSET_BITS=64    
ocamlopt_cflags: -O2 -fno-strict-aliasing -fwrapv -pthread    
ocamlopt_cppflags: -D_FILE_OFFSET_BITS=64    
bytecomp_c_compiler: cc -O2 -fno-strict-aliasing -fwrapv -pthread -D_FILE_OFFSET_BITS=64    
native_c_compiler: cc -O2 -fno-strict-aliasing -fwrapv -pthread -D_FILE_OFFSET_BITS=64    
bytecomp_c_libraries: -lm -lpthread    
native_c_libraries: -lm    
native_pack_linker: ld -r -o    
architecture: arm64    
model: default    
int_size: 63    
word_size: 64    
system: macosx    
asm: cc -c -Wno-trigraphs    
asm_cfi_supported: true    
with_frame_pointers: false    
ext_exe:    
ext_obj: .o    
ext_asm: .s    
ext_lib: .a    
ext_dll: .so    
os_type: Unix    
default_executable_name: a.out    
systhread_supported: true    
host: aarch64-apple-darwin22.2.0    
target: aarch64-apple-darwin22.2.0    
flambda: false    
safe_string: true    
default_safe_string: true    
flat_float_array: true    
function_sections: false    
afl_instrument: false    
windows_unicode: false    
supports_shared_libraries: true    
naked_pointers: true    
exec_magic_number: Caml1999X031    
cmi_magic_number: Caml1999I031    
cmo_magic_number: Caml1999O031    
cma_magic_number: Caml1999A031    
cmx_magic_number: Caml1999Y031    
cmxa_magic_number: Caml1999Z031    
ast_impl_magic_number: Caml1999M031    
ast_intf_magic_number: Caml1999N031    
cmxs_magic_number: Caml1999D031    
cmt_magic_number: Caml1999T031    
linear_magic_number: Caml1999L031

There doesn’t seem to be anything unusual in your config.
If you want to debug this yourself, I don’t really have any good advice to give; this could be a bug in the code generation for arm64, or an issue with one of your PPXes (this is unlikely), or some hardware issue (even more unlikely). With the appalling state of native debugger support in OCaml, there’s not much you can do yourself without getting deep into the compiler’s internals. I can point you to some small debugging helpers that I wrote some time ago (here), but they’re specific to gdb and might be a little outdated.
If you can publish either your full code or a sample that is sufficient to trigger a segfault, I strongly advise you to submit a bug report and let the maintainers find out what’s wrong.

Aren’t PPX processor similar (but far more complex) with scheme macro. In this case, they shouldn’t produce unsafe code. I am wrong ?

(This remember me an error with a C code : the optimizer had some issues, then I had to compile without the optimize flag… and the result was correct. Unfortunately, there are no many optimization flag. Try the byte code compiler…)

1 Like

You could also try comparing to X86, to get a sense of whether it’s a code generation thing or not.

1 Like

The bytecode compiler is a good suggestion.

Is the crash deterministically in the same function for a given executable? You could try putting a breakpoint at the entry of that function, and single stepping in the debugger and observing the value of the registers to find out who writes the invalid value (0x1, 0x9) into the register causing the crash (x8, 22) in your case. You can also try to set a watchpoint on the register to find out exactly who writes the invalid value, but I’m not sure whether lldb would support that (which won’t be a hardware watchpoint, but probably emulated via single-stepping, i.e. slow, but not as slow as manually typing next instruction every time…)
[I have a suspicion that this might be somehow arm64 ABI related, e.g. if a register being used by OCaml is overwritten by C code, but a quick skim through the differences noted on Apple Developer Documentation doesn’t show anything particularly alarming. Finding exactly where the bad value is stored should help narrow down where the bug is)

1 Like

Before considering the option of a bug in code generation (which can indeed always happen), I would wonder about the code or its dependencies introducing memory-unsafe behavior by using:

  • unsafe FFI code (basically, all external declarations are suspicious)
  • or Obj.magic or the Obj module in general
  • or possibly Array.unsafe_{get,set} or other memory-unsafe library functions (they all have unsafe in the name; but some unsafe-named functions are in fact memory-safe, they just have subtle unchecked preconditions)

@Jon_Harrop, do you yourself use external declarations or the Obj module? Are you relying (in addition to some PPX extensions) on some external libraries that you suspect may be doing so?

2 Likes

Also: using the “debug runtime” may help debugging memory errors by failing earlier with a clearer error message. There is some information on how to use the debug runtimes, and other related tips, at ocaml/HACKING.adoc at trunk · ocaml/ocaml · GitHub .

1 Like

Sounds like memory is being overwritten. Are you using unsafe_set anywhere?

Nope. This code is entirely safe AFAICT.

The plot thickens: I also get a segfault when I run it as bytecode!

EDIT I was wrong: I’m using a home-grown ExtArray that uses Obj.magic. I’ll try replacing that first…

1 Like

Turns out it was my buggy use of Obj.magic. Thanks!

5 Likes