# OCaml speed comparison - calculating pi with Leibniz - optimize?

I have been nerd-sniped by this speed comparison repo: GitHub - niklas-heer/speed-comparison: A repo which compares the speed of different programming languages.

And implemented the following OCaml version of Leibniz’ algorithm to calculate `pi` (well, not so much implemented as ported over the Python version):

``````(* ocamlopt -O2 -o leibniz leibniz.ml && strip leibniz *)

let rec calc sum curr upto =
if curr >= upto then sum
else calc (sum +. 1. /. float_of_int curr) (curr + 4) upto

let calc rounds =
let double = rounds * 2 in
calc 0. (1 - double) (double + 1)

let () =
let rounds_txt = open_in_bin "rounds.txt" in
let rounds = int_of_string (input_line rounds_txt) in
close_in rounds_txt;

let pi = 4. *. calc rounds in
Printf.printf "%.16f\n" pi
``````

When I run this with `time ./leibniz` I consistently get:

``````\$ time ./leibniz
3.1415926435899633
./leibniz  0.38s user 0.00s system 99% cpu 0.388 total
``````

Am I missing anything obvious? The Go version runs in a fraction of that time:

``````time ./leibniz_go
3.141592663589326
./leibniz_go  0.13s user 0.01s system 95% cpu 0.142 total
``````

Judging from the generated assembly, the OCaml compiler is not able to unbox the float (for the return value) across the recursive call. You can see the `subq \$16, %r15 cmpq (%r14), %r15` allocation sequence, as well as memory writes inside of the loop of `camlExample__calc_5`.

An imperative version does quite a bit better (1.235 vs 2.273 on my machine).

2 Likes

Probably the old boxing-of-floats issue.
EDIT: sniped

To make numeric code go faster, I suggest avoiding recursive calls (which inhibit float unboxing) and sticking to good old while loops, ie something like:

``````let calc rounds =
let sum = ref 0. in
let double = 2 * rounds in
let curr = ref (1 - double) in
let upto = double + 1 in
while !curr < upto do
sum := !sum +. 1. /. float_of_int !curr;
curr := !curr + 4
done;
!sum
``````

Cheers,
Nicolas

4 Likes

So now that we’ve brought this issue up again, is there any work being done to fix it? Will Flambda2 deal with it?

1 Like

Thanks all! That makes sense.

Moving `float_of_int` outside of the loop retires 200M fewer instructions according to `perf stat` but doesn’t actually speed things up on my end.

Yes, and in some cases yes.
The best way to fix it is to use Jane Street’s unboxed types (proposed as an RFC a while ago, with significant parts already implemented). Without that, Flambda 2 can still unbox the float values in the above example, but it’s based on heuristics so there’s no guarantee that it will keep working if small changes are made. Typically, if the function wasn’t tail-recursive it wouldn’t work.

4 Likes

Just to update here that with the recommended imperative approach the time on my laptop now matches the Go time, 0.13s. Just for fun, I’ll try submitting a PR to that repo.

3 Likes