I have head (from RWO) that float array has been special-cased in the OCaml runtime to store unboxed values.
I wrote a simple benchmark program to try to identify what the impact of this optimization is and where it might be better to use floatarray rather than float array.
Here is the code:
module NormalArray = struct
(** normal recursive sum definition *)
let rec rec_sum acc i fa =
if i >= Array.length fa then acc
else rec_sum (acc +. Array.unsafe_get fa i) (i + 1) fa
let rec_sum = rec_sum 0. 0
(** fold-based sum definition *)
let fold_sum fa = Array.fold_left ( +. ) 0.0 fa
let imp_sum fa =
let total = ref 0.0 in
for i = 0 to Array.length fa do
total := !total +. Array.unsafe_get fa i
done;
!total
end
module Floater = struct
(** normal recursive sum definition *)
let rec rec_sum acc i fa =
if i >= Float.Array.length fa then acc
else rec_sum (acc +. Float.Array.unsafe_get fa i) (i + 1) fa
let rec_sum = rec_sum 0. 0
(** fold-based sum definition *)
let fold_sum fa = Float.Array.fold_left ( +. ) 0.0 fa
let imp_sum fa =
let total = ref 0.0 in
for i = 0 to Float.Array.length fa do
total := !total +. Float.Array.unsafe_get fa i
done;
!total
end
let timeit ?(times=1) thunk =
let start = Sys.time () in
for _ = 1 to times do
thunk ()
done;
let stop = Sys.time () in
stop -. start
let report_time ?(times=1) message thunk =
let time = timeit ~times thunk in
Printf.printf "%s: %f\n" message time;
flush stdout
let array_of_floats = Array.make 500_000_000 1.0
let float_array = Float.Array.make 500_000_000 1.0
let thunk f fa () =
ignore (f fa)
let () =
let t = report_time in
t "rec_sum" @@ thunk NormalArray.rec_sum array_of_floats;
t "fold_sum" @@ thunk NormalArray.fold_sum array_of_floats;
t "imp_sum" @@ thunk NormalArray.imp_sum array_of_floats;
print_newline ();
t "rec_sum" @@ thunk Floater.rec_sum float_array;
t "fold_sum" @@ thunk Floater.fold_sum float_array;
t "imp_sum" @@ thunk Floater.imp_sum float_array;
In my benchmark, float array is actually very slightly faster than floatarray in all cases.
Is there ever a time when it is better to use floatarray, given that float array is now unboxed? Is floatarray just arround for legacy reasons? Are my benchmarks faulty?
The advantages of floatarray over float array are:
Memory use: float array of size N uses 3N words of storage vs just N words for the corresponding floatarray.
floatarrays can be passed directly to C functions expecting C-style arrays since they share the same data layout.
The main downside is that accessing an element of a floatarray involves a float allocation. In any case, I don’t think your benchmark tests these aspects.
floatarray is not legacy, and is very much appropriate when writing heavy-duty numerical code in OCaml.
Since each floating-point value is boxed in a separate memory block, it can be inefficient to handle large arrays of floats in comparison to unboxed integers. OCaml therefore special-cases records or arrays that contain onlyfloat types. These are stored in a block that contains the floats packed directly in the data section, with Double_array_tag set to signal to the collector that the contents are not OCaml values.
Given that, I don’t understand what you mean that teach element in a float array takes three words. I would take this to mean that machine floats are stored inline in float array, not as OCaml’s boxed float type.
Currently, float array and floatarray are using the same internal representation, so the performances should be identical in most cases. But in the future, float array will no longer store unboxed values (so that its representation is always the same as 'a array), so it will become slower than floatarray.
Wow. I’m kind of surprised they are deprecating this optimization. Is there a mailing list thread or some other link where I can read about the rational for this decision? It’s easy to see why they would optimize this case. It’s more difficult to see why they would “unoptimize” it, from a lay person’s perspective, other than the fact that two array implementations internally is “ugly”. Does it cause some kind of slowdown due to runtime tag checking?
The compiler can already be built to use the fully boxed representation for float array and the flat representation for floatarray. This is done by passing the --disable-flat-float-array flag to the configure script. My answer should be interpreted in that context.
The tradeoffs of this optimization have been discussed at length in various places, eg
Incidentally, we are already using this mode at LexiFi (and have been, for some time), and we documented how we migrated our codebase to use the floatarray type: