How to inline this function call so it behaves the same

I am not familiar with the use of flambda, so perhaps I have just overlooked this in the documentation. This code is from Oleg Kiselyov’s essay on modularity, “Lightweight static guarantees” (well, a minimized version of the code to highlight the performance issue.) In his paper he is trying to make the point that we can get good performance without sacrificing correctness guarantees, so it would be really nice if I could figure out how to make reality live up to the ideal and make the code actually run fast.

There is a factor of 5 performance difference between these two loops and I am wondering what is going on or how to properly inline the function or configure flambda. I have used -O3 and -rounds 5 separately but I’m just not sure what’s going on. Is there a semantic reason it can’t be inlined and it has to allocate it? Any advice is appreciated.

let [@inline] bcmp i j = if i <= j then Some (i, j) else None

let test () =
  let ctr = ref 0 in
  let t0 = Unix.gettimeofday () in
  for i = 0 to 1_000_000 do
    let x = match bcmp i (i/2 +1) with Some (i,j)  -> i + j | None -> 0 in
    ctr := !ctr + x
  done;
  let t1 = Unix.gettimeofday () in
  Printf.printf "With function call: %f\n" (t1 -. t0);

  let ctr = ref 0 in
  let t0 = Unix.gettimeofday () in
  for i = 0 to 1_000_000 do
    let j = (i/2 + 1) in
    let x = match if i <=j then Some (i, j) else None with Some (i,j)  -> i + j | None -> 0 in
    ctr := !ctr + x
  done;
  let t1 = Unix.gettimeofday () in
  Printf.printf "No function call: %f\n" (t1 -. t0);

I also tried defining bcmp locally in the loop body and it didn’t help.

The issue is not that bcmp is not inlined. The issue is that the inlined code is not being specialized for integer arguments. Instead, it uses the polymorphic comparison, which is costly. If you change the code to

let [@inline] bcmp (i:int) j = if i <= j then Some (i, j) else None

it should work a lot better.

1 Like