And implemented the following OCaml version of Leibniz’ algorithm to calculate pi (well, not so much implemented as ported over the Python version):
(* ocamlopt -O2 -o leibniz leibniz.ml && strip leibniz *)
let rec calc sum curr upto =
if curr >= upto then sum
else calc (sum +. 1. /. float_of_int curr) (curr + 4) upto
let calc rounds =
let double = rounds * 2 in
calc 0. (1 - double) (double + 1)
let () =
let rounds_txt = open_in_bin "rounds.txt" in
let rounds = int_of_string (input_line rounds_txt) in
close_in rounds_txt;
let pi = 4. *. calc rounds in
Printf.printf "%.16f\n" pi
When I run this with time ./leibniz I consistently get:
$ time ./leibniz
3.1415926435899633
./leibniz 0.38s user 0.00s system 99% cpu 0.388 total
Am I missing anything obvious? The Go version runs in a fraction of that time:
time ./leibniz_go
3.141592663589326
./leibniz_go 0.13s user 0.01s system 95% cpu 0.142 total
Judging from the generated assembly, the OCaml compiler is not able to unbox the float (for the return value) across the recursive call. You can see the subq $16, %r15 cmpq (%r14), %r15 allocation sequence, as well as memory writes inside of the loop of camlExample__calc_5.
An imperative version does quite a bit better (1.235 vs 2.273 on my machine).
To make numeric code go faster, I suggest avoiding recursive calls (which inhibit float unboxing) and sticking to good old while loops, ie something like:
let calc rounds =
let sum = ref 0. in
let double = 2 * rounds in
let curr = ref (1 - double) in
let upto = double + 1 in
while !curr < upto do
sum := !sum +. 1. /. float_of_int !curr;
curr := !curr + 4
done;
!sum
Yes, and in some cases yes.
The best way to fix it is to use Jane Street’s unboxed types (proposed as an RFC a while ago, with significant parts already implemented). Without that, Flambda 2 can still unbox the float values in the above example, but it’s based on heuristics so there’s no guarantee that it will keep working if small changes are made. Typically, if the function wasn’t tail-recursive it wouldn’t work.
Just to update here that with the recommended imperative approach the time on my laptop now matches the Go time, 0.13s. Just for fun, I’ll try submitting a PR to that repo.