OCaml speed comparison - calculating pi with Leibniz - optimize?

yawaramin · May 6, 2023, 6:55pm

I have been nerd-sniped by this speed comparison repo: GitHub - niklas-heer/speed-comparison: A repo which compares the speed of different programming languages.

And implemented the following OCaml version of Leibniz’ algorithm to calculate pi (well, not so much implemented as ported over the Python version):

(* ocamlopt -O2 -o leibniz leibniz.ml && strip leibniz *)

let rec calc sum curr upto =
  if curr >= upto then sum
  else calc (sum +. 1. /. float_of_int curr) (curr + 4) upto

let calc rounds =
  let double = rounds * 2 in
  calc 0. (1 - double) (double + 1)

let () =
  let rounds_txt = open_in_bin "rounds.txt" in
  let rounds = int_of_string (input_line rounds_txt) in
  close_in rounds_txt;

  let pi = 4. *. calc rounds in
  Printf.printf "%.16f\n" pi

When I run this with time ./leibniz I consistently get:

$ time ./leibniz
3.1415926435899633
./leibniz  0.38s user 0.00s system 99% cpu 0.388 total

Am I missing anything obvious? The Go version runs in a fraction of that time:

time ./leibniz_go
3.141592663589326
./leibniz_go  0.13s user 0.01s system 95% cpu 0.142 total

copy · May 6, 2023, 7:31pm

Judging from the generated assembly, the OCaml compiler is not able to unbox the float (for the return value) across the recursive call. You can see the subq $16, %r15 cmpq (%r14), %r15 allocation sequence, as well as memory writes inside of the loop of camlExample__calc_5.

An imperative version does quite a bit better (1.235 vs 2.273 on my machine).

bluddy · May 6, 2023, 7:32pm

Probably the old boxing-of-floats issue.
EDIT: sniped

nojb · May 6, 2023, 7:33pm

To make numeric code go faster, I suggest avoiding recursive calls (which inhibit float unboxing) and sticking to good old while loops, ie something like:

let calc rounds =
  let sum = ref 0. in
  let double = 2 * rounds in
  let curr = ref (1 - double) in
  let upto = double + 1 in
  while !curr < upto do
    sum := !sum +. 1. /. float_of_int !curr;
    curr := !curr + 4
  done;
  !sum

Cheers,
Nicolas

bluddy · May 6, 2023, 7:54pm

So now that we’ve brought this issue up again, is there any work being done to fix it? Will Flambda2 deal with it?

yawaramin · May 6, 2023, 7:57pm

Thanks all! That makes sense.

rbjorklin · May 6, 2023, 9:36pm

Moving float_of_int outside of the loop retires 200M fewer instructions according to perf stat but doesn’t actually speed things up on my end.

vlaviron · May 6, 2023, 10:00pm

Yes, and in some cases yes.
The best way to fix it is to use Jane Street’s unboxed types (proposed as an RFC a while ago, with significant parts already implemented). Without that, Flambda 2 can still unbox the float values in the above example, but it’s based on heuristics so there’s no guarantee that it will keep working if small changes are made. Typically, if the function wasn’t tail-recursive it wouldn’t work.

yawaramin · May 6, 2023, 11:03pm

Just to update here that with the recommended imperative approach the time on my laptop now matches the Go time, 0.13s. Just for fun, I’ll try submitting a PR to that repo.

Topic		Replies	Views
Ocaml Fibonacci is faster compared to lisp/scheme Learning	15	2570	March 27, 2023
Ocaml speed : recursive function optimization Learning ocaml	3	756	December 2, 2023
Add two numbers using lists Learning	2	2344	September 2, 2018
How to speed up this function? Learning	32	3128	August 21, 2022
Ocaml Bytecode performance? Learning	21	2506	October 11, 2023

OCaml speed comparison - calculating pi with Leibniz - optimize?

Related topics