`float array` and `floatarray` have similar performance in my benchmark

I have head (from RWO) that float array has been special-cased in the OCaml runtime to store unboxed values.

I wrote a simple benchmark program to try to identify what the impact of this optimization is and where it might be better to use floatarray rather than float array.

Here is the code:

module NormalArray = struct
  (** normal recursive sum definition *)
  let rec rec_sum acc i fa =
    if i >= Array.length fa then acc
    else rec_sum (acc +. Array.unsafe_get fa i) (i + 1) fa
  let rec_sum = rec_sum 0. 0

  (** fold-based sum definition *)
  let fold_sum fa = Array.fold_left ( +. ) 0.0 fa
  let imp_sum fa =
    let total = ref 0.0 in
    for i = 0 to Array.length fa do
      total := !total +. Array.unsafe_get fa i
    done;
    !total
end

module Floater = struct
  (** normal recursive sum definition *)
  let rec rec_sum acc i fa =
    if i >= Float.Array.length fa then acc
    else rec_sum (acc +. Float.Array.unsafe_get fa i) (i + 1) fa
  let rec_sum = rec_sum 0. 0

  (** fold-based sum definition *)
  let fold_sum fa = Float.Array.fold_left ( +. ) 0.0 fa
  let imp_sum fa =
    let total = ref 0.0 in
    for i = 0 to Float.Array.length fa do
      total := !total +. Float.Array.unsafe_get fa i
    done;
    !total
end

let timeit ?(times=1) thunk =
  let start = Sys.time () in
  for _ = 1 to times do
    thunk ()
  done;
  let stop = Sys.time () in
  stop -. start

let report_time ?(times=1) message thunk =
  let time = timeit ~times thunk in
  Printf.printf "%s: %f\n" message time;
  flush stdout

let array_of_floats = Array.make 500_000_000 1.0
let float_array = Float.Array.make 500_000_000 1.0

let thunk f fa () =
  ignore (f fa)


let () =
  let t = report_time in
  t "rec_sum" @@ thunk NormalArray.rec_sum array_of_floats;
  t "fold_sum" @@ thunk NormalArray.fold_sum array_of_floats;
  t "imp_sum" @@ thunk NormalArray.imp_sum array_of_floats;
  print_newline ();
  t "rec_sum" @@ thunk Floater.rec_sum float_array;
  t "fold_sum" @@ thunk Floater.fold_sum float_array;
  t "imp_sum" @@ thunk Floater.imp_sum float_array;

Here is the output:

rec_sum: 1.270130
fold_sum: 1.544846
imp_sum: 0.607575

rec_sum: 1.264709
fold_sum: 1.644644
imp_sum: 0.616881

In my benchmark, float array is actually very slightly faster than floatarray in all cases.
Is there ever a time when it is better to use floatarray, given that float array is now unboxed? Is floatarray just arround for legacy reasons? Are my benchmarks faulty?

The advantages of floatarray over float array are:

  • Memory use: float array of size N uses 3N words of storage vs just N words for the corresponding floatarray.
  • floatarrays can be passed directly to C functions expecting C-style arrays since they share the same data layout.

The main downside is that accessing an element of a floatarray involves a float allocation. In any case, I don’t think your benchmark tests these aspects.

floatarray is not legacy, and is very much appropriate when writing heavy-duty numerical code in OCaml.

Cheers,
Nicolas

1 Like

So, this is from RWO:

Since each floating-point value is boxed in a separate memory block, it can be inefficient to handle large arrays of floats in comparison to unboxed integers. OCaml therefore special-cases records or arrays that contain only float types. These are stored in a block that contains the floats packed directly in the data section, with Double_array_tag set to signal to the collector that the contents are not OCaml values.

Given that, I don’t understand what you mean that teach element in a float array takes three words. I would take this to mean that machine floats are stored inline in float array, not as OCaml’s boxed float type.

What am I missing here?

Currently, float array and floatarray are using the same internal representation, so the performances should be identical in most cases. But in the future, float array will no longer store unboxed values (so that its representation is always the same as 'a array), so it will become slower than floatarray.

2 Likes

Wow. I’m kind of surprised they are deprecating this optimization. Is there a mailing list thread or some other link where I can read about the rational for this decision? It’s easy to see why they would optimize this case. It’s more difficult to see why they would “unoptimize” it, from a lay person’s perspective, other than the fact that two array implementations internally is “ugly”. Does it cause some kind of slowdown due to runtime tag checking?

Yes, this optimization of float array is a pessimization of 'a array, when 'a is not known at compile time, due to runtime tag checking.

3 Likes

The compiler can already be built to use the fully boxed representation for float array and the flat representation for floatarray. This is done by passing the --disable-flat-float-array flag to the configure script. My answer should be interpreted in that context.

The tradeoffs of this optimization have been discussed at length in various places, eg

https://www.lexifi.com/blog/ocaml/about-unboxed-float-arrays/

Incidentally, we are already using this mode at LexiFi (and have been, for some time), and we documented how we migrated our codebase to use the floatarray type:

https://www.lexifi.com/blog/ocaml/floatarray-migration/

Cheers,
Nicolas

4 Likes

This seems like pretty important context. :sweat_smile:

Thanks for the links!

Also, it has not really been decided if the runtime unboxing for float arrays will be deprecated or not.

Moreover, if it happens, it will be after a long series of changes to introduce more general optimized and unboxed types to OCaml.

In other words, you should not expect this deprecation to happen soon.

1 Like

also available as compiler option ocaml-option-no-flat-float-array in opam.