I am not familiar with the use of flambda, so perhaps I have just overlooked this in the documentation. This code is from Oleg Kiselyov’s essay on modularity, “Lightweight static guarantees” (well, a minimized version of the code to highlight the performance issue.) In his paper he is trying to make the point that we can get good performance without sacrificing correctness guarantees, so it would be really nice if I could figure out how to make reality live up to the ideal and make the code actually run fast.
There is a factor of 5 performance difference between these two loops and I am wondering what is going on or how to properly inline the function or configure flambda. I have used -O3
and -rounds 5
separately but I’m just not sure what’s going on. Is there a semantic reason it can’t be inlined and it has to allocate it? Any advice is appreciated.
let [@inline] bcmp i j = if i <= j then Some (i, j) else None
let test () =
let ctr = ref 0 in
let t0 = Unix.gettimeofday () in
for i = 0 to 1_000_000 do
let x = match bcmp i (i/2 +1) with Some (i,j) -> i + j | None -> 0 in
ctr := !ctr + x
done;
let t1 = Unix.gettimeofday () in
Printf.printf "With function call: %f\n" (t1 -. t0);
let ctr = ref 0 in
let t0 = Unix.gettimeofday () in
for i = 0 to 1_000_000 do
let j = (i/2 + 1) in
let x = match if i <=j then Some (i, j) else None with Some (i,j) -> i + j | None -> 0 in
ctr := !ctr + x
done;
let t1 = Unix.gettimeofday () in
Printf.printf "No function call: %f\n" (t1 -. t0);
I also tried defining bcmp locally in the loop body and it didn’t help.