Flambda float limitation and workarround

craff · October 11, 2021, 4:26am

Hello,

I was playing with flambda and inlining for some work in algebra. I implemented a signature for Field and a few modules like vectors and matrices which I instanciate after with mlgmp, zarith rational and float. Unfortunately, flambda does not yet reaches the performance of the defunctorised code … But in fact, it is just because the optimisation related to the type float are not propagated. I really do not want to duplicate my code just for the sake of optimisation.

The code atached below shows that just by giving the type annotation “float” solves the problem so I use a dune rule like this:

(rule
   (target FVector.ml)
   (deps Vector.ml)
   (action (with-stdout-to %{target}
     (run sed "s/Make(R:S)/Make(R:S with type t = float)/" %{deps}))))

And use FVector instead of Vector when using Float.

I attach a demonstrating example (with manual annotation) below which gives the following timings when compiled with ocamlopt -O 3

module without type annotation versus float code
: tf = 4.78, tg = 1.83, factor = 2.61x, gain = 61.67%
module with type annotation versus float code
: tf = 1.83, tg = 1.71, factor = 1.07x, gain = 6.91%

This shows that the functorized code without type information is around 2.5 slower than the specialized code,
but just the tpe annotation completely solves the problem.

Here is the file:

module type Ring =
  sig
    type t
    val zero : t
    val one : t
    val ( + ) : t -> t -> t
    val ( - ) : t -> t -> t
    val ( ~-) : t -> t
    val ( * ) : t -> t -> t
    val ( / ) : t -> t -> t
    val of_int : int -> t
  end

module type Module =
  sig
    module F : Ring
    type f = F.t
    type t
    val dim : int
    val zero : t
    val ( + ) : t -> t -> t
    val ( - ) : t -> t -> t
    val ( * ) : f -> t -> t
    val dot : t -> t -> f
  end

module ArrayModule (F : Ring) (D: sig val dim:int end) =
  struct
    module F = F
    type f = F.t
    type t = f array
    let dim = D.dim
    let zero = Array.make dim F.zero
    let ( + ) = Array.map2 F.(+)
    let ( - ) = Array.map2 F.(-)
    let [@inlined] ( * ) = fun x -> Array.map (F.( * ) x)

    let [@inlined] dot v1 v2 =
      let open F in
      let r = ref F.zero in
      Array.iteri (fun i x -> r := x * v2.(i) + !r) v1;
      !r

  end

module FArrayModule (F : Ring with type t = float) (D: sig val dim:int end) =
  struct
    module F = F
    type f = F.t
    type t = f array
    let dim = D.dim
    let zero = Array.make dim F.zero
    let ( + ) = Array.map2 F.(+)
    let ( - ) = Array.map2 F.(-)
    let [@inlined] ( * ) = fun x -> Array.map (F.( * ) x)

    let [@inlined] dot v1 v2 =
      let open F in
      let r = ref F.zero in
      Array.iteri (fun i x -> r := x * v2.(i) + !r) v1;
      !r

  end


module Float =
  struct
    type t = float
    let zero = 0.
    let one = 1.
    let ( + ) = ( +. ) [@@inlined]
    let ( - ) = ( -. ) [@@inlined]
    let ( ~-) = ( ~-.) [@@inlined]
    let ( * ) = ( *. ) [@@inlined]
    let ( / ) = ( /. ) [@@inlined]
    let of_int = float_of_int
  end

module Dim = struct let dim = 10_000_000 end

module VFloat = ArrayModule(Float)[@inlined](Dim)
module FVFloat = FArrayModule(Float)[@inlined](Dim)

(* Timing *)
let chrono f x =
  Gc.compact ();
  let t1 = Sys.time () in
  let r = f x in
  let t2 = Sys.time () in
  (t2 -. t1, r)

let chronos msg f g x =
  Gc.full_major (); (* GC to avoid paying for the allocation of the other *)
  let tf, lf = chrono f x in
  Gc.full_major ();
  let tg, lg = chrono g x in
  assert(lf = lg);
  let g = 100. *. (tf -. tg) /. tf in
  let f = tf /. tg in
  Printf.printf "%s: tf = %.2f, tg = %.2f, factor = %.2fx, gain = %.2f%%\n%!" msg tf tg f g

let compute f v =
  let r = ref 0.0 in
  for _ = 0 to 100 do
    r := f v v +. !r
  done

let fdot v1 v2 =
  let r = ref 0.0 in
  Array.iteri (fun i x -> r := x *. v2.(i) +. !r) v1;
  !r

let v = Array.init VFloat.dim (fun _ -> Random.float 1.0)

let _ = chronos "module without type annotation versus float code\n\t" (compute VFloat.dot) (compute fdot) v
let _ = chronos "module with    type annotation versus float code\n\t" (compute FVFloat.dot) (compute fdot) v

vlaviron · October 11, 2021, 5:42am

The most likely explanation is the flat-float-array optimisation: without the type annotation, arrays are represented as generic OCaml arrays, so each cell is a pointer to an allocated float value. With the type annotation, the compiler can use a specific layout for float arrays, where each cell is an unboxed float number.

The usual workaround for cases like yours is to add an Array submodule to the module type used as functor argument. In your case, this could look like:

module type Ring =
sig
  type t
  module Array : sig
    type elt = t
    type t
    val length : t -> int
    val get : t -> int -> elt
    val set : t -> int -> elt -> unit
    (* Complete with the functions that you need *)
  end
  val zero : t
  val one : t
  val ( + ) : t → t → t
  val ( - ) : t → t → t
  val ( ~-) : t → t
  val ( * ) : t → t → t
  val ( / ) : t → t → t
  val of_int : int → t
end

module ArrayModule (F : Ring) (D: sig val dim:int end) =
struct
  module F = F
  type f = F.t
  type t = F.Array.t
  let dim = D.dim
  let zero = F.Array.make dim F.zero
  let ( + ) = F.Array.map2 F.(+)
  let ( - ) = F.Array.map2 F.(-)
  let [@inline] ( * ) = fun x → F.Array.map (F.( * ) x)

  let [@inline] dot v1 v2 =
    let open F in
    (* Note: this open statement shadows the Array module from the standard library.
       This means that [v.(i)] now resolves to [F.Array.get v i], which is what we want. *)
    let r = ref F.zero in
    Array.iteri (fun i x -> r := x * v2.(i) + !r) v1;
    !r

end

module Float =
struct
  type t = float
  module Array = struct
    type elt = float
    include Float.Array
 end
  let zero = 0.
  let one = 1.
  let ( + ) x y = ( +. ) x y [@@inline]
  let ( - ) x y = ( -. ) x y [@@inline]
  let ( ~-) x y = ( ~-.) x y [@@inline]
  let ( * ) x y = ( *. ) x y [@@inline]
  let ( / ) x y = ( /. ) x y [@@inline]
  let of_int x = float_of_int x
end

This approach also works if your compiler is configured with no-flat-float-array, which is not the case of your FArrayModule functor.

craff · October 12, 2021, 1:12am

Thanks for your solution. I was not aware of it. I still find it impact to much my code and it does not handle other cases like float field in records and may be (I am less sure) passing of float arguments by register.

I also noticed you use

let ( + ) x y = ( +. ) x y [@@inline]

instead of

let ( + ) = ( +. )  [@@inline]

Does it makes a difference ? As +. is a primitive I thought it was eta-expended ?

Cheers,
Christophe

vlaviron · October 12, 2021, 6:58am

It is eventually expanded, but too late. The [@@inline] annotation needs to be attached to a function during parsing, and the eta-expansion occurs after type-checking.
You should get a warning 53 (misplaced attribute) with your version.

Topic		Replies	Views
Difference in normal ocaml and flambda ocaml Ecosystem	6	1009	May 26, 2023
What Lambda flags do you usually use to speed up your code? Ecosystem flambda , optimization	6	1384	February 3, 2022
"flambda for newbies" questions Learning jbuilder , flambda	3	3199	September 19, 2017
Can a state monad be optimized out with flambda Learning flambda	6	1082	May 16, 2022
Function call not inlined Learning flambda	17	902	May 5, 2022

Flambda float limitation and workarround

Related topics