`x *. y +. z` is systematically compiled into an `fma` on ARM

In the section 14.5 Compatibility with the bytecode compiler (OCaml - Native-code compilation (ocamlopt)), we read:

On ARM and PowerPC processors (32 and 64 bits), fused multiply-add (FMA) instructions can be generated for a floating-point multiplication followed by a floating-point addition or subtraction, as in x *. y +. z. The FMA instruction avoids rounding the intermediate result x *. y, which is generally beneficial, but produces floating-point results that differ slightly from those produced by the bytecode interpreter.

and, indeed, on a Mac M1 for instance, the following code fails if compiled in native code, whereas it succeeds on x86 for instance or when compiled with the bytecode compiler:

let test x =
  x *. x -. x *. x
  [@@ocaml.inline never]

let () =
  assert (test (1. /. 3.) = 0.)

Is it possible to disable the automatic use of FMA instructions on ARM?

1 Like

I don’t have an ARM machine to test it on, but have you tried wrapping some of the intermediate terms with Sys.opaque_identity?

Cheers,
Nicolas

Thank you for the suggestion! I just tried the code below, but I got the same result (i.e., failure on ARM) with

let test x =
  Sys.opaque_identity (x *. x) -. Sys.opaque_identity (x *. x)
  [@@ocaml.inline never]

IIUC, the opaque function appears in -dlambda but is already eliminated in -dcmm, and the choice of FMA instructions is performed afterwards, at instruction selection level (visible in -dsel).

I have noticed the solution used in Coq: a module Float64 declaring an abstract type, which is defined as an alias to float in the implementation, and the module exposes the usual operators on it. This solution works, at the cost of paying a function call for each float operation. I am wondering if we can do better.

1 Like

Which version of OCaml are you using? According to "double-double" floating point algorithm tricked by pernickety hardware · Issue #10323 · ocaml/ocaml · GitHub (now closed), using Sys.opaque_identity should work to prevent the generation of FMA on OCaml 4.11.

Thanks @silene for finding this thread! Indeed, the code succeeds on OCaml 4.13. What is still surprising is that I observe the above mentioned failure on OCaml 4.12 (and OCaml 4.11 and below are just unavailable on Mac M1 according to opam), and I didn’t find anything related in Changes between 4.12 and 4.13.

Thanks again @nojb and @silene, I now have a solution for disabling FMA!