Finalizers, value lifetimes and `Sys.opaque_identity`

Dear discuss,

I think some around here may be interested in the bugtracker thread MPR#7861 (GC collects values it shouldn’t with flambda), which is about an interaction between finalizers and optimizations that advanced users should be aware of.

The bug report was about flambda breaking the behavior of a program using Lwt_react. @stedolan figured out that it was not a bug in the compiler, but a programming error in Lwt. Quoting Stephen:

It seems to be an issue with the implementation of Lwt_react.with_finaliser rather than with the OCaml GC. That function is implemented as follows:

 let with_finaliser f event =
   let r = ref () in
   Gc.finalise (finalise f) r;
   map (fun x -> ignore r; x) event

The idea is to add a finaliser to the reference r, and ensure that the reference r is reachable from the returned event. However, flambda (correctly!) notices that r is not used in the returned event (since it is ignored), and optimises the use of r away. This means that the GC is free to collect r and run the finaliser immediately.

The simplest fix is to ensure that the reference to r is not optimised away, by hiding the fact that it is unused from the optimiser using Sys.opaque_identity:

map (fun x -> ignore (Sys.opaque_identity r); x) event

The Sys.opaque_identity function is one you should know about if you write programs that make assumptions about the behavior of code optimizations. Its documentation is carefully formulated:

For the purposes of optimization, opaque_identity behaves like an unknown (and thus possibly side-effecting) function.

At runtime, opaque_identity disappears altogether.

A typical use of this function is to prevent pure computations from being optimized away in benchmarking loops. For example:

for _round = 1 to 100_000 do
  ignore (Sys.opaque_identity (my_pure_computation ()))
  • Since 4.03.0

(The fact that it is only available since 4.03 means that it can be tricky to use in codebases that wish to support older OCaml versions. In particular, it didn’t exist when this Lwt code was written (and tested to work correctly with the optimizer of the time). If you are designing your own language, think of adding an opaque_identity function from the start!)