But I feel like I’m just using big numbers without knowing if it’s useless or not.
So I was wondering what Flambda flags do you use and with which value when you don’t care about compilation time and just want that your program is as fast as possible ?
I would try profiling your code before tuning the Flambda parameters. That will help you figure out if there is some function that should have been inlined but wasn’t.
The optimizations that Flambda performs are fairly straightforward. It decides which functions to inline/specialize and then it performs any simplifications or unboxing that get exposed by the inlining. It’s a helpful optimization for large programs with lots of abstractions, but in my experience it doesn’t make much difference for tight loops.
You’re doing regular calls to Sys.time. Those are not cheap. It’s hard to know how much of your time is spent doing these calls, but I’d suggest benchmarking without them (on examples that do not timeout) to get an idea.
You’re doing indirect calls to the eval function. I’m not sure what this function is supposed to do, but if it’s only supposed to do a small computation you’re going to lose a lot of time due to the indirect calls. You might want to experiment with turning your module into a functor taking the eval function as argument, and annotating this functor with [@@inline always]. This should create specialised versions of your functions, that know which eval function will be called and can inline it if needed.
You’re doing quite a lot of array accesses. It’s possible that you could save non-trivial amounts of time by using the unsafe versions of array accesses (Array.unsafe_get and Array.unsafe_set). This of course assumes that your code only accesses arrays correctly. By the way, mod is a relatively expensive operation. You could replace all occurrences of expr mod bound by let index = expr in if index < bound then index else index - bound (assuming the index never reaches 2*bound).
As said above, you won’t know whether your code can be improved or not without profiling. If you’re on Linux, I’d highly recommend experimenting with the perf tool to get a better idea of where your program spends its time.
With the exception of the -O* flags, the Flambda flags are only there to help you when you know what’s going wrong. Some of them aren’t even monotone (meaning that at some point, increasing their value can give you worse performance). So stick with -O3 until you’ve found a reason to increase one of the other parameters.
Thanks a lot for your detailed answer.
I’ll try all this stuff and tell you the difference.
For the eval function, eval i j is just a wrapper to get the distance between 2 cities, which is stored in a dist matrix.
Basically it’s
let create_eval foo =
let dists = ...
in
fun i j -> dists.(i).(j)
I could avoid passing it as an argument by calling create_eval foo since I could pass foo instead of eval.
Do you think I should make my module a fonctor with [@@inline always] or just do what I just described or keep it the way it is?
It seems like Sys.time() is the most consuming function in my program if I interpret caml_sys_time_include_children_unboxed correctly. Thanks A LOT for the input, you just saved my oral exam.