 # Flambda float limitation and workarround

Hello,

I was playing with flambda and inlining for some work in algebra. I implemented a signature for Field and a few modules like vectors and matrices which I instanciate after with mlgmp, zarith rational and float. Unfortunately, flambda does not yet reaches the performance of the defunctorised code … But in fact, it is just because the optimisation related to the type float are not propagated. I really do not want to duplicate my code just for the sake of optimisation.

The code atached below shows that just by giving the type annotation “float” solves the problem so I use a dune rule like this:

``````(rule
(target FVector.ml)
(deps Vector.ml)
(action (with-stdout-to %{target}
(run sed "s/Make(R:S)/Make(R:S with type t = float)/" %{deps}))))
``````

And use `FVector` instead of `Vector` when using `Float`.

I attach a demonstrating example (with manual annotation) below which gives the following timings when compiled with ocamlopt -O 3

module without type annotation versus float code
: tf = 4.78, tg = 1.83, factor = 2.61x, gain = 61.67%
module with type annotation versus float code
: tf = 1.83, tg = 1.71, factor = 1.07x, gain = 6.91%

This shows that the functorized code without type information is around 2.5 slower than the specialized code,
but just the tpe annotation completely solves the problem.

Here is the file:

``````module type Ring =
sig
type t
val zero : t
val one : t
val ( + ) : t -> t -> t
val ( - ) : t -> t -> t
val ( ~-) : t -> t
val ( * ) : t -> t -> t
val ( / ) : t -> t -> t
val of_int : int -> t
end

module type Module =
sig
module F : Ring
type f = F.t
type t
val dim : int
val zero : t
val ( + ) : t -> t -> t
val ( - ) : t -> t -> t
val ( * ) : f -> t -> t
val dot : t -> t -> f
end

module ArrayModule (F : Ring) (D: sig val dim:int end) =
struct
module F = F
type f = F.t
type t = f array
let dim = D.dim
let zero = Array.make dim F.zero
let ( + ) = Array.map2 F.(+)
let ( - ) = Array.map2 F.(-)
let [@inlined] ( * ) = fun x -> Array.map (F.( * ) x)

let [@inlined] dot v1 v2 =
let open F in
let r = ref F.zero in
Array.iteri (fun i x -> r := x * v2.(i) + !r) v1;
!r

end

module FArrayModule (F : Ring with type t = float) (D: sig val dim:int end) =
struct
module F = F
type f = F.t
type t = f array
let dim = D.dim
let zero = Array.make dim F.zero
let ( + ) = Array.map2 F.(+)
let ( - ) = Array.map2 F.(-)
let [@inlined] ( * ) = fun x -> Array.map (F.( * ) x)

let [@inlined] dot v1 v2 =
let open F in
let r = ref F.zero in
Array.iteri (fun i x -> r := x * v2.(i) + !r) v1;
!r

end

module Float =
struct
type t = float
let zero = 0.
let one = 1.
let ( + ) = ( +. ) [@@inlined]
let ( - ) = ( -. ) [@@inlined]
let ( ~-) = ( ~-.) [@@inlined]
let ( * ) = ( *. ) [@@inlined]
let ( / ) = ( /. ) [@@inlined]
let of_int = float_of_int
end

module Dim = struct let dim = 10_000_000 end

module VFloat = ArrayModule(Float)[@inlined](Dim)
module FVFloat = FArrayModule(Float)[@inlined](Dim)

(* Timing *)
let chrono f x =
Gc.compact ();
let t1 = Sys.time () in
let r = f x in
let t2 = Sys.time () in
(t2 -. t1, r)

let chronos msg f g x =
Gc.full_major (); (* GC to avoid paying for the allocation of the other *)
let tf, lf = chrono f x in
Gc.full_major ();
let tg, lg = chrono g x in
assert(lf = lg);
let g = 100. *. (tf -. tg) /. tf in
let f = tf /. tg in
Printf.printf "%s: tf = %.2f, tg = %.2f, factor = %.2fx, gain = %.2f%%\n%!" msg tf tg f g

let compute f v =
let r = ref 0.0 in
for _ = 0 to 100 do
r := f v v +. !r
done

let fdot v1 v2 =
let r = ref 0.0 in
Array.iteri (fun i x -> r := x *. v2.(i) +. !r) v1;
!r

let v = Array.init VFloat.dim (fun _ -> Random.float 1.0)

let _ = chronos "module without type annotation versus float code\n\t" (compute VFloat.dot) (compute fdot) v
let _ = chronos "module with    type annotation versus float code\n\t" (compute FVFloat.dot) (compute fdot) v
``````

The most likely explanation is the flat-float-array optimisation: without the type annotation, arrays are represented as generic OCaml arrays, so each cell is a pointer to an allocated float value. With the type annotation, the compiler can use a specific layout for float arrays, where each cell is an unboxed float number.

The usual workaround for cases like yours is to add an `Array` submodule to the module type used as functor argument. In your case, this could look like:

``````module type Ring =
sig
type t
module Array : sig
type elt = t
type t
val length : t -> int
val get : t -> int -> elt
val set : t -> int -> elt -> unit
(* Complete with the functions that you need *)
end
val zero : t
val one : t
val ( + ) : t → t → t
val ( - ) : t → t → t
val ( ~-) : t → t
val ( * ) : t → t → t
val ( / ) : t → t → t
val of_int : int → t
end

module ArrayModule (F : Ring) (D: sig val dim:int end) =
struct
module F = F
type f = F.t
type t = F.Array.t
let dim = D.dim
let zero = F.Array.make dim F.zero
let ( + ) = F.Array.map2 F.(+)
let ( - ) = F.Array.map2 F.(-)
let [@inline] ( * ) = fun x → F.Array.map (F.( * ) x)

let [@inline] dot v1 v2 =
let open F in
(* Note: this open statement shadows the Array module from the standard library.
This means that [v.(i)] now resolves to [F.Array.get v i], which is what we want. *)
let r = ref F.zero in
Array.iteri (fun i x -> r := x * v2.(i) + !r) v1;
!r

end

module Float =
struct
type t = float
module Array = struct
type elt = float
include Float.Array
end
let zero = 0.
let one = 1.
let ( + ) x y = ( +. ) x y [@@inline]
let ( - ) x y = ( -. ) x y [@@inline]
let ( ~-) x y = ( ~-.) x y [@@inline]
let ( * ) x y = ( *. ) x y [@@inline]
let ( / ) x y = ( /. ) x y [@@inline]
let of_int x = float_of_int x
end
``````

This approach also works if your compiler is configured with `no-flat-float-array`, which is not the case of your `FArrayModule` functor.

1 Like

Thanks for your solution. I was not aware of it. I still find it impact to much my code and it does not handle other cases like float field in records and may be (I am less sure) passing of float arguments by register.

I also noticed you use

``````let ( + ) x y = ( +. ) x y [@@inline]
``````

``````let ( + ) = ( +. )  [@@inline]
Does it makes a difference ? As `+.` is a primitive I thought it was eta-expended ?
It is eventually expanded, but too late. The `[@@inline]` annotation needs to be attached to a function during parsing, and the eta-expansion occurs after type-checking.