`x *. y +. z` is systematically compiled into an `fma` on ARM

thierry-martinez · December 14, 2021, 4:03pm

In the section 14.5 Compatibility with the bytecode compiler (OCaml - Native-code compilation (ocamlopt)), we read:

On ARM and PowerPC processors (32 and 64 bits), fused multiply-add (FMA) instructions can be generated for a floating-point multiplication followed by a floating-point addition or subtraction, as in x *. y +. z. The FMA instruction avoids rounding the intermediate result x *. y, which is generally beneficial, but produces floating-point results that differ slightly from those produced by the bytecode interpreter.

and, indeed, on a Mac M1 for instance, the following code fails if compiled in native code, whereas it succeeds on x86 for instance or when compiled with the bytecode compiler:

let test x =
  x *. x -. x *. x
  [@@ocaml.inline never]

let () =
  assert (test (1. /. 3.) = 0.)

Is it possible to disable the automatic use of FMA instructions on ARM?

nojb · December 14, 2021, 4:42pm

I don’t have an ARM machine to test it on, but have you tried wrapping some of the intermediate terms with Sys.opaque_identity?

Cheers,
Nicolas

thierry-martinez · December 14, 2021, 10:42pm

Thank you for the suggestion! I just tried the code below, but I got the same result (i.e., failure on ARM) with

let test x =
  Sys.opaque_identity (x *. x) -. Sys.opaque_identity (x *. x)
  [@@ocaml.inline never]

IIUC, the opaque function appears in -dlambda but is already eliminated in -dcmm, and the choice of FMA instructions is performed afterwards, at instruction selection level (visible in -dsel).

I have noticed the solution used in Coq: a module Float64 declaring an abstract type, which is defined as an alias to float in the implementation, and the module exposes the usual operators on it. This solution works, at the cost of paying a function call for each float operation. I am wondering if we can do better.

silene · December 15, 2021, 4:09pm

Which version of OCaml are you using? According to "double-double" floating point algorithm tricked by pernickety hardware · Issue #10323 · ocaml/ocaml · GitHub (now closed), using Sys.opaque_identity should work to prevent the generation of FMA on OCaml 4.11.

thierry-martinez · December 15, 2021, 9:30pm

Thanks @silene for finding this thread! Indeed, the code succeeds on OCaml 4.13. What is still surprising is that I observe the above mentioned failure on OCaml 4.12 (and OCaml 4.11 and below are just unavailable on Mac M1 according to opam), and I didn’t find anything related in Changes between 4.12 and 4.13.

Thanks again @nojb and @silene, I now have a solution for disabling FMA!

Topic		Replies	Views
Modular Implicits Ecosystem compiler , faq	63	23596	October 24, 2022
When are floating-point registers used in OCaml? Learning floating-point , calling-convention	3	491	April 19, 2023
[ANN] LLVM 15 is out! Community	9	2528	September 17, 2023
[ANN] New F* release on opam (2025.02.17) Community announce	0	174	February 25, 2025
Building iOS apps with OCaml? Learning	8	2249	May 18, 2023

`x *. y +. z` is systematically compiled into an `fma` on ARM

Related topics