I have written this function in three versions with ocaml, rescript and rust. The time spend in js and rust are about 15s, but 35+s in ocaml. I know this is because float is boxed in ocaml, do we have a way to fix it?
In ocaml
let rec fib n result = match n with 0. -> result | n -> fib (n -. 1.) (n +. result)
let start_time = Sys.time () in
let result = fib 10000000000. 0. in
let end_time = Sys.time () in
"time use: "
^ (end_time -. start_time |> string_of_float)
^ "; result: "
^ (result |> string_of_float)
|> print_endline
In rescript.
let rec fib = (n, result) => {
switch n {
| 0. => result
| n => fib(n -. 1., result +. n)
}
}
let timeStart = Js.Date.now()
fib(10000000000., 0.)->Js.log2(`fib: `, _)
let tiemEnd = Js.Date.now()
((tiemEnd -. timeStart) /. 1000.)->Js.log2(`time use: `, _)
In rust
use std::time;
fn fib(n: f64, result: f64) -> f64 {
if n == 0. {
result
} else {
fib(n - 1., result + n)
}
}
fn main() {
let start_time = time::SystemTime::now();
let result = fib(10000000000., 0.);
let end_time = time::SystemTime::now();
let time_use = end_time.duration_since(start_time).unwrap().as_secs();
println!("time use: {time_use}, result: {result}");
}
let n = 10000000000.
let rec fib1 n result = match n with
| 0. -> result
| n -> fib1 (n -. 1.) (n +. result)
let fib2 n =
let n = ref n in
let res = ref 0. in
while !n > 0. do
res := !res +. !n;
n := !n -. 1.;
done;
!res
let timeit f =
let start = Unix.gettimeofday() in
let res = f () in
let t = Unix.gettimeofday() -. start in
t, res
let () =
let t1, r1 = timeit (fun () -> fib1 n 0.) in
Printf.printf "fib1: %f in %.4fs\n%!" r1 t1;
let t2, r2 = timeit (fun () -> fib2 n) in
Printf.printf "fib2: %f in %.4fs\n%!" r2 t2;
()
and it’s much faster (1/4th of the time):
fib1: 50000000009992404992.000000 in 43.7277s
fib2: 50000000009992404992.000000 in 11.6772s
I think it’s indeed due to the fact that OCaml can unbox local floats but might have more difficulty doing that when passing the floats to a recursive function.
The real function that this person tries to optimise is probably using floats, since this person talks about floats that are boxed in ocaml. This is probably just a shorter example here.
If your real function that you try to get faster is more complex than just a fib, you can also use rescript to see how to rewrite it with a while loop (as c-cube said):
$ cat fib.ml
let rec fib n result =
match n with
| 0. -> result
| n -> fib (n -. 1.) (n +. result)
$ bsc fib.ml
// Generated by ReScript, PLEASE EDIT WITH CARE
'use strict';
function fib(_n, _result) {
while(true) {
var result = _result;
var n = _n;
if (n === 0) {
return result;
}
_result = n + result;
_n = n - 1;
continue ;
};
}
exports.fib = fib;
/* No side effect */
Notice that you can give the plain ocaml syntax to the rescript compiler, you don’t need to rewrite it with the rescript syntax.
Thanks, i actually want to write it in ocaml, just tested it in js and rust. And, can we unbox the float manually in ocaml? I am a new bee in ocaml, thanks again.
Yes, I understood, but as c-cube said, a while loop will be faster. If you want to rewrite a loop made with a let rec function into a while loop, you can see how the rescript compiler optimise it, and rewrite it in plain ocaml to follow c-cube’s suggestion. (And this is something that you can do giving plain ocaml syntax to the rescript compiler, you don’t have to rewrite it with rescript syntax.)
short version: Not currently, no. It may be in the future.
long version: Some people have worked on making this possible (more precisely: on explicitly asking for a function parameter to take an unboxed float parameter, which would do the trick here), but there is not a lot of people for which this is a critical need (it’s often not too hard to rewrite without needing this) and they haven’t pushed very hard. In case people are curious, I wrote a summary of compiler PRs about this last year.
Putting aside the more general occurence (which is known to me, floats gets boxed when they cross functions) I’m still a bit surprised by the result here.
My intuition would have been: this function is going to be turned into a loop by the compiler (which an (fib1[@tailcall]) annotation can confirm) and that means the floats are going to be locally unboxed.
But as far as my rusty assembly goes that doesn’t seem to be the case (at least on 4.12).
A tail call is not a loop; it is a function call. No optimization will remove this call until the very end of the compilation pipeline. In particular, the compiler has to respect the calling convention of the recursive function, because it will end up generating a jump to its entry point (modulo some stack adjustment). In other words, this looks a lot like a peephole optimization that would replace the assembly sequence call foo; return by the single opcode branch foo.
Hmm,i also tested the int version of this function, the time cost of ocaml, js and rust were both about 10s. If the tailcall is not optimized into loop in ocaml, why it is so fast?
Because int values in OCaml are always unboxed (that’s the reason why their size is your architecture’s word size minus one bit, you need a bit so that the gc can distinguish them from pointers).
Also as you can see in the link I provided, tail calls do get eventually optimized into a loop, however as @silene mentioned when it does so it’s too late for the local unboxed float optimizations to quick in.
I’d caution selection bias here. The OCaml community consists of people who tolerate OCaml’s shortcomings including this one.
Try going to another language community and suggesting they box individual floats and see how they respond!
If there is genuine interest in fostering adoption it would be good to quantify what proportion of developers in general would want or expect such features. I suspect a few features would make a huge difference to many people who do not currently use OCaml.
[just asking here]
I remember when I brought up Rust iterators, one response was “we can do the same thing with Seq and aggressive flambda, etc, etc, etc”. But this seems to suggest that if I write a bunch of code with unboxed floats that uses Seq and such, maybe lots of lambda-abstractions, the code might not get flattened-down into loops soon enough for unboxing to happen ?
Which would argue that if I care about performance, I’d want my iterator “tower” to consist in operations that didn’t need significant analysis to be squashed down into loops.
I would love to have an ability to vote on areas where the community would like the core devs to focus on next. In my mind, this is a very big issue that needs to be addressed, as it means that if you happen to work with floats, you need to use completely different coding paradigms (non functional ones) to get reasonable performance.
The choices made when designing the language were very much for specific aspects such as symbolic programming, privileging integers and reducing allocation as much as possible. The downside is that other aspects, which are very common in general programming, such as floating point math, are second class citizens.
The situation is not that simple. There are multiple ways of encoding datatypes and it is not that obvious that some other encoding that favor floating-point numbers would actually cripple symbolic programming. For instance, several javascript engines rely on NaN-boxing, which makes it possible to unbox both 64-bit floating-point numbers and 32-bit integers (while still keeping an OCaml-like garbage collector). If you were to ask symbolic programmers which of 63-bit integers (but no 64-bit integers) or floating-point numbers should be unboxed, what would be the consensus?
I think that the good long-term approach to those issues is to add expressivity in the source language to talk about unboxed floats and other “value types”, which is a topic that @lpw25, @stedolan and Antal Spector-Zabusky are working on. It will not magically solve all float-performance issues, but it will give a clear, modular language to express solutions, manually-written by the user (requires some expertise) or automated by the optimizer. I don’t know if/when their work will be proposed for upstreaming, but I have come to accept that these things take time, especially given that many of the people contributing their work on OCaml are working on many things at once.
(I think that we should keep growing the set of people contributing to the language and implementation, rather than trying to redirect the workforce of people already contributing.)