How to speed up this function?

Mng12345 · August 10, 2022, 1:46am

I have written this function in three versions with ocaml, rescript and rust. The time spend in js and rust are about 15s, but 35+s in ocaml. I know this is because float is boxed in ocaml, do we have a way to fix it?

In ocaml

let rec fib n result = match n with 0. -> result | n -> fib (n -. 1.) (n +. result)
let start_time = Sys.time () in
let result = fib 10000000000. 0. in
let end_time = Sys.time () in
"time use: "
^ (end_time -. start_time |> string_of_float)
^ "; result: "
^ (result |> string_of_float)
|> print_endline

In rescript.

let rec fib = (n, result) => {
  switch n {
  | 0. => result
  | n => fib(n -. 1., result +. n)
  }
}


let timeStart = Js.Date.now()
fib(10000000000., 0.)->Js.log2(`fib: `, _)
let tiemEnd = Js.Date.now()
((tiemEnd -. timeStart) /. 1000.)->Js.log2(`time use: `, _)

In rust

use std::time;

fn fib(n: f64, result: f64) -> f64 {
    if n == 0. {
        result
    } else {
        fib(n - 1., result + n)
    }
}

fn main() {
    let start_time = time::SystemTime::now();
    let result = fib(10000000000., 0.);
    let end_time = time::SystemTime::now();
    let time_use = end_time.duration_since(start_time).unwrap().as_secs();
    println!("time use: {time_use}, result: {result}");
}

c-cube · August 10, 2022, 2:12am

I tried this:

let n = 10000000000.

let rec fib1 n result = match n with
  | 0. -> result
  | n -> fib1 (n -. 1.) (n +. result)

let fib2 n =
  let n = ref n in
  let res = ref 0. in
  while !n > 0.  do
    res := !res +. !n;
    n := !n -. 1.;
  done;
  !res

let timeit f =
  let start = Unix.gettimeofday() in
  let res = f () in
  let t = Unix.gettimeofday() -. start in
  t, res

let () =
  let t1, r1 = timeit (fun () -> fib1 n 0.) in
  Printf.printf "fib1: %f in %.4fs\n%!" r1 t1;
  let t2, r2 = timeit (fun () -> fib2 n) in
  Printf.printf "fib2: %f in %.4fs\n%!" r2 t2;
  ()

and it’s much faster (1/4th of the time):

fib1: 50000000009992404992.000000 in 43.7277s
fib2: 50000000009992404992.000000 in 11.6772s

I think it’s indeed due to the fact that OCaml can unbox local floats but might have more difficulty doing that when passing the floats to a recursive function.

pdonadeo · August 10, 2022, 8:15am

By the way, why using floats? Using integers is much faster and for n > 134217727 the result is mathematically wrong using floats.

fccm · August 10, 2022, 8:27am

The real function that this person tries to optimise is probably using floats, since this person talks about floats that are boxed in ocaml. This is probably just a shorter example here.

fccm · August 10, 2022, 8:34am

If your real function that you try to get faster is more complex than just a fib, you can also use rescript to see how to rewrite it with a while loop (as c-cube said):

$ cat fib.ml
let rec fib n result =
  match n with
  | 0. -> result
  | n -> fib (n -. 1.) (n +. result)

$ bsc fib.ml
// Generated by ReScript, PLEASE EDIT WITH CARE
'use strict';

function fib(_n, _result) {
  while(true) {
    var result = _result;
    var n = _n;
    if (n === 0) {
      return result;
    }
    _result = n + result;
    _n = n - 1;
    continue ;
  };
}

exports.fib = fib;
/* No side effect */

Notice that you can give the plain ocaml syntax to the rescript compiler, you don’t need to rewrite it with the rescript syntax.

Mng12345 · August 10, 2022, 8:43am

Thanks, i actually want to write it in ocaml, just tested it in js and rust. And, can we unbox the float manually in ocaml? I am a new bee in ocaml, thanks again.

fccm · August 10, 2022, 9:07am

Yes, I understood, but as c-cube said, a while loop will be faster. If you want to rewrite a loop made with a let rec function into a while loop, you can see how the rescript compiler optimise it, and rewrite it in plain ocaml to follow c-cube’s suggestion. (And this is something that you can do giving plain ocaml syntax to the rescript compiler, you don’t have to rewrite it with rescript syntax.)

gasche · August 10, 2022, 4:57pm

And, can we unbox the float manually in ocaml?

short version: Not currently, no. It may be in the future.

long version: Some people have worked on making this possible (more precisely: on explicitly asking for a function parameter to take an unboxed float parameter, which would do the trick here), but there is not a lot of people for which this is a critical need (it’s often not too hard to rewrite without needing this) and they haven’t pushed very hard. In case people are curious, I wrote a summary of compiler PRs about this last year.

dbuenzli · August 10, 2022, 5:56pm

Putting aside the more general occurence (which is known to me, floats gets boxed when they cross functions) I’m still a bit surprised by the result here.

My intuition would have been: this function is going to be turned into a loop by the compiler (which an (fib1[@tailcall]) annotation can confirm) and that means the floats are going to be locally unboxed.

But as far as my rusty assembly goes that doesn’t seem to be the case (at least on 4.12).

silene · August 10, 2022, 8:16pm

A tail call is not a loop; it is a function call. No optimization will remove this call until the very end of the compilation pipeline. In particular, the compiler has to respect the calling convention of the recursive function, because it will end up generating a jump to its entry point (modulo some stack adjustment). In other words, this looks a lot like a peephole optimization that would replace the assembly sequence call foo; return by the single opcode branch foo.

dbuenzli · August 10, 2022, 8:30pm

Thanks for the explanation. It seems I have quite a bunch of recursive functions to rewrite as loops now that you corrected my wrong understanding…

Mng12345 · August 11, 2022, 12:08am

Hmm，i also tested the int version of this function, the time cost of ocaml, js and rust were both about 10s. If the tailcall is not optimized into loop in ocaml, why it is so fast?

dbuenzli · August 11, 2022, 12:19am

Because int values in OCaml are always unboxed (that’s the reason why their size is your architecture’s word size minus one bit, you need a bit so that the gc can distinguish them from pointers).

Also as you can see in the link I provided, tail calls do get eventually optimized into a loop, however as @silene mentioned when it does so it’s too late for the local unboxed float optimizations to quick in.

Jon_Harrop · August 11, 2022, 3:40am

I’d caution selection bias here. The OCaml community consists of people who tolerate OCaml’s shortcomings including this one.

Try going to another language community and suggesting they box individual floats and see how they respond!

If there is genuine interest in fostering adoption it would be good to quantify what proportion of developers in general would want or expect such features. I suspect a few features would make a huge difference to many people who do not currently use OCaml.

Chet_Murthy · August 11, 2022, 4:13am

[just asking here]
I remember when I brought up Rust iterators, one response was “we can do the same thing with Seq and aggressive flambda, etc, etc, etc”. But this seems to suggest that if I write a bunch of code with unboxed floats that uses Seq and such, maybe lots of lambda-abstractions, the code might not get flattened-down into loops soon enough for unboxing to happen ?

Which would argue that if I care about performance, I’d want my iterator “tower” to consist in operations that didn’t need significant analysis to be squashed down into loops.

Just asking.

bluddy · August 11, 2022, 6:18am

I would love to have an ability to vote on areas where the community would like the core devs to focus on next. In my mind, this is a very big issue that needs to be addressed, as it means that if you happen to work with floats, you need to use completely different coding paradigms (non functional ones) to get reasonable performance.

Mng12345 · August 11, 2022, 6:28am

I choose ocaml for the performance, but it seems it’s not better than js(rescript) in some places.

bluddy · August 11, 2022, 6:59am

The choices made when designing the language were very much for specific aspects such as symbolic programming, privileging integers and reducing allocation as much as possible. The downside is that other aspects, which are very common in general programming, such as floating point math, are second class citizens.

silene · August 11, 2022, 9:26am

The situation is not that simple. There are multiple ways of encoding datatypes and it is not that obvious that some other encoding that favor floating-point numbers would actually cripple symbolic programming. For instance, several javascript engines rely on NaN-boxing, which makes it possible to unbox both 64-bit floating-point numbers and 32-bit integers (while still keeping an OCaml-like garbage collector). If you were to ask symbolic programmers which of 63-bit integers (but no 64-bit integers) or floating-point numbers should be unboxed, what would be the consensus?

gasche · August 11, 2022, 9:51am

I think that the good long-term approach to those issues is to add expressivity in the source language to talk about unboxed floats and other “value types”, which is a topic that @lpw25, @stedolan and Antal Spector-Zabusky are working on. It will not magically solve all float-performance issues, but it will give a clear, modular language to express solutions, manually-written by the user (requires some expertise) or automated by the optimizer. I don’t know if/when their work will be proposed for upstreaming, but I have come to accept that these things take time, especially given that many of the people contributing their work on OCaml are working on many things at once.

(I think that we should keep growing the set of people contributing to the language and implementation, rather than trying to redirect the workforce of people already contributing.)

Topic		Replies	Views
Ocaml speed : recursive function optimization Learning ocaml	3	758	December 2, 2023
Ocaml Fibonacci is faster compared to lisp/scheme Learning	15	2585	March 27, 2023
[help]Improving the performance of this code Learning	33	1070	March 24, 2023
OCaml speed comparison - calculating pi with Leibniz - optimize? Learning	8	766	May 6, 2023
Any suggestions on making this piece of OCaml numerical code more efficient? Learning performance , numerical , comp-chem	28	1091	February 27, 2024

How to speed up this function?

Related topics