On ARM and PowerPC processors (32 and 64 bits), fused multiply-add (FMA) instructions can be generated for a floating-point multiplication followed by a floating-point addition or subtraction, as in x *. y +. z. The FMA instruction avoids rounding the intermediate result x *. y, which is generally beneficial, but produces floating-point results that differ slightly from those produced by the bytecode interpreter.
and, indeed, on a Mac M1 for instance, the following code fails if compiled in native code, whereas it succeeds on x86 for instance or when compiled with the bytecode compiler:
let test x =
x *. x -. x *. x
[@@ocaml.inline never]
let () =
assert (test (1. /. 3.) = 0.)
Is it possible to disable the automatic use of FMA instructions on ARM?
Thank you for the suggestion! I just tried the code below, but I got the same result (i.e., failure on ARM) with
let test x =
Sys.opaque_identity (x *. x) -. Sys.opaque_identity (x *. x)
[@@ocaml.inline never]
IIUC, the opaque function appears in -dlambda but is already eliminated in -dcmm, and the choice of FMA instructions is performed afterwards, at instruction selection level (visible in -dsel).
I have noticed the solution used in Coq: a module Float64 declaring an abstract type, which is defined as an alias to float in the implementation, and the module exposes the usual operators on it. This solution works, at the cost of paying a function call for each float operation. I am wondering if we can do better.
Thanks @silene for finding this thread! Indeed, the code succeeds on OCaml 4.13. What is still surprising is that I observe the above mentioned failure on OCaml 4.12 (and OCaml 4.11 and below are just unavailable on Mac M1 according to opam), and I didn’t find anything related in Changes between 4.12 and 4.13.
Thanks again @nojb and @silene, I now have a solution for disabling FMA!