Corrupted backtraces when omitting option `-g` to `ocamlopt`

While compiling with ocamlopt and running programs with OCAMLRUNPARAM=b, I have faced backtraces which were obviously wrong (mentioning raising locations that couldn’t possibly escape, in functions that weren’t even called). After I got a minimal example, I tested it against all OCaml versions I had at my disposal, and it happens with all of them. The issue doesn’t show with ocamlc though.

Then I realized I had forgotten option -g to ocamlopt. With that option, backtraces are sensical again. Now it makes sense to me that this option is necessary for getting a backtrace, but I’m wondering: why do I get a (nonsensical) backtrace at all when this option is not given to ocamlopt? Is this regarded as normal?

Repro:

let _delayed_bomb () =
  let _ = Hashtbl.create in
          (* ^^ module Hashtbl specifically seems
                to be playing a role in the issue *)
  ()

let () = assert false (* test.ml:5 *)

Compile and run (Linux):

# omit option `-g` here:
ocamlopt test.ml -o test.exe  &&  OCAMLRUNPARAM=b ./test.exe

Observe nonsensical backtrace (here with OCaml 4.13.1, but I’ve also tested 4.07.1 + flambda, 4.08.1, 4.11.1, 4.12.0 + flambda + fp):

Fatal error: exception Assert_failure("test.ml", 5, 9)
Raised at Stdlib__String.index_rec in file "string.ml", line 128, characters 19-34
Called from Stdlib__String.contains_from in file "string.ml", line 192, characters 15-34

Whereas with ocamlc I just get the fatal error with no backtrace.

I haven’t looked at it in detail, but my guess is that the following happens:

  • the stdlib is compiled with -g, so backtrace info is recorded for functions from the stdlib
  • since you call the Hashtbl module in your program, this makes it so that the Hashtbl module is linked
  • during the initialization code of the Hashtbl module, some exception is raised (and catched), e.g. these lines: ocaml/hashtbl.ml at 284834d31767d323aae1cee4ed719cc36aa1fb2c · ocaml/ocaml · GitHub which end up also calling these from the String module : ocaml/string.ml at 284834d31767d323aae1cee4ed719cc36aa1fb2c · ocaml/ocaml · GitHub (you can see both of them use exceptions locally for control flow).
  • in native mode, if I recall correctly, there is one global mutable region used to store the backtrace (which is set when raising, and later read when you want the backtrace), and thus, when your module raises an exception, since it wasn’t compiled with -g, it does not properly set the backtrace info (or partially sets it), and thus when you read it, you actually read the last backtrace that was correctly set.
1 Like

Ah, that’s a brilliant explanation! [With just one detail: Hashtbl doesn’t raise an exception itself, at least not in the lines you pointed to because Sys.getenv "OCAMLRUNPARAM" succeeds; but it calls String.contains which is the culprit as you said.] I even confirmed it by setting OCAMLRUNPARAM=bR, so that String.contains would not raise an exception locally, and indeed I got no weird backtrace.

Why doesn’t the backtrace mention Hashtbl though, the initial call site? Since it is in the stdlib too, it should have debug symbols if String has them.

I take several lessons from it.

  1. opam compiles the stdlib with debug symbols (unless it is an option I have set long ago and forgot about).
  2. dune compiles with -g too (my initial case was with a custom library I had built with dune, and my dune config didn’t mention that flag).
  3. An accidental observation that is more surprising to me: in a code like this, the local non-escaping exception is not compiled to a mere jump, even if I unroll the recursive function totally (iter is actually tail-rec so I’d like it to be inlined once and transformed into a loop, but I’m not sure how to have flambda do this); at least, it still affects the backtrace info somehow:
let[@specialize] rec iter f xs =
  match xs with
  | []     -> ()
  | x::xs' -> f x ; iter f xs'              (* line 4 *)

let find () =
  let exception Found in
  try
    (iter[@specialized][@unrolled 8])       (* line 9 *)
      (fun x -> if x = 3 then raise Found)  (* line 10 *)
      [0; 1; 2; 3; 4; 5] ;
    false
  with Found ->
    true

The backtrace I get is:

Raised at Test.find.(fun) in file "test.ml" (inlined), line 10 […]
Raised by primitive operation at Test.iter in file "test.ml" (inlined), line 4 […]
Raised by primitive operation at Test.iter in file "test.ml" (inlined), line 4 […]
Raised by primitive operation at Test.iter in file "test.ml" (inlined), line 4 […]
Raised by primitive operation at Test.iter in file "test.ml" (inlined), line 4 […]
Raised by primitive operation at Test.find in file "test.ml", line 9 […]

Oh and I forgot the main lesson of course: the native compiler / linker happily lets you mix code with debug symbols and code without, and doesn’t check that you do it consistently when printing backtraces (I get that this might be non-trivial to implement).