Hi! I’ve recently encountered some code that initializes a List using a int list ref
and a for loop, instead of using the built-in List.init.
I’ve therefore decided to rewrite it, and then wanted to run comparison to see how much speed I gained. Surprise surprise, using List.init is slower!!
Would anyone be able to explain to me this behavior?
Here are the two functions:
let possibilities i j seed =
let pos = ref [] in
for k = 0 to 7 do
let v = seed.(k) in
let b1 = v / 4 in
let b2 = (v - (b1 * 4)) / 2 in
let b3 = v - (b1 * 4) - (b2 * 2) in
pos :=
(i + (((2 * b1) - 1) * (b3 + 1)), j + (((2 * b2) - 1) * (2 - b3))) :: !pos
done;
!pos
let possibilities i j seed =
List.init 8 (fun k ->
let v = seed.(7 - k) in
let b1 = v / 4 in
let b2 = (v - (b1 * 4)) / 2 in
let b3 = v - (b1 * 4) - (b2 * 2) in
(i + (((2 * b1) - 1) * (b3 + 1)), j + (((2 * b2) - 1) * (2 - b3))))
In both cases, I then test the function by running:
let seed = [| 0; 7; 2; 3; 1; 6; 4; 5 |]
let () =
for i = 0 to int_of_string Sys.argv.(1) do
ignore (possibilities (Random.int 100) (Random.int 100) seed)
done
I don’t know what is the new tail_mod_cons implementation of List.init in the stdlib ; maybe you could try another implementation of List.init. The most simple one is:
let list_init n f =
let rec aux i acc =
if i >= n then List.rev acc
else aux (i+1) ((f i)::acc)
in
aux 0 []
There is another optimisation that is available in the ext-lib, which avoids the last List.rev, that you could try and compare.
I noticed in the past (don’t know if it’s still like that) that the number of parameters of a function also has a cost in ocaml. In the std-lib, aux functions are extern aux functions in stead of local aux functions, which alows to reduce the number of parameters.
With an other implementation of list_init, I get a smaller time than with the for loop.
And if you’re trying to optimise, you can also use Array.unsafe_get which avoids bounds checks.
Experts will be able to give you an in-depth answer, but generally speaking, if you use List.init you pay a price for calling a function for each element of the array, and furthermore, the function has free variables which require some extra memory indirections to be accessed from inside the closure.
Moreover, if you are using List.init that uses TRMC, note that there are some subtle promotion memory effects that may result in a slowdown if the result is short-lived (eg if it is ignored), see the discussion under “Promotion” in https://github.com/ocaml/ocaml/pull/9760.
Coming a bit late, but your intuition that the built-in List.init is faster is wrong. A for-loop updating a non-escaping reference is very fast: the reference is not even allocated (since it doesn’t escape), so you get very close to the minimal amount of instructions needed to create the list. On the other hand, with List.init, you pay the cost of the function calls (which can be recovered by the optimiser to some extent), and even if you remove all of them you end up either allocating the list twice (once in reverse order then once in the right order), with @deca3’s version, or separating the allocations and initialisations, with the tail-mod-cons version from the standard library.
If the standard library exposed a List.rev_init function, then a good optimiser might turn it into code as efficient as the loop, but you need either flambda1’s recursive function specialisation or flambda2’s loopification (see our blog post) for that.