What Lambda flags do you usually use to speed up your code?

I’d like to compare my program with the one used in a paper which was coded in cpp.

As I know that they made everything to make their code optimized, I’d like to speed up mine the more I can in order to have a decent comparison.

As my program is very simple (just a few while and for loops) there is nothing I can really do in order to speed it up.

So I was told to use Flambda, and for now I only use (:standard -O3) as flags.
I also tried

 (:standard -rounds 100)
   (:standard inline 500)
   (:standard inline-max-unroll 50)

But I feel like I’m just using big numbers without knowing if it’s useless or not.

So I was wondering what Flambda flags do you use and with which value when you don’t care about compilation time and just want that your program is as fast as possible ?

Is there nothing faster than just -O3 ?

1 Like

I would try profiling your code before tuning the Flambda parameters. That will help you figure out if there is some function that should have been inlined but wasn’t.

The optimizations that Flambda performs are fairly straightforward. It decides which functions to inline/specialize and then it performs any simplifications or unboxing that get exposed by the inlining. It’s a helpful optimization for large programs with lots of abstractions, but in my experience it doesn’t make much difference for tight loops.

1 Like

I already know that the function which takes the most time is my implementation of the 2-opt algorithm. This is basically

let rec while_loop = loop1 ..
and loop1 = loop2 ..
and loop2 = ...

It’s pretty simple, so I guess nothing can be done. I’m not sure if rewriting it with real while and for loop (instead of rec functions) will help.

A few things strike me when looking at your code:

  • You’re doing regular calls to Sys.time. Those are not cheap. It’s hard to know how much of your time is spent doing these calls, but I’d suggest benchmarking without them (on examples that do not timeout) to get an idea.
  • You’re doing indirect calls to the eval function. I’m not sure what this function is supposed to do, but if it’s only supposed to do a small computation you’re going to lose a lot of time due to the indirect calls. You might want to experiment with turning your module into a functor taking the eval function as argument, and annotating this functor with [@@inline always]. This should create specialised versions of your functions, that know which eval function will be called and can inline it if needed.
  • You’re doing quite a lot of array accesses. It’s possible that you could save non-trivial amounts of time by using the unsafe versions of array accesses (Array.unsafe_get and Array.unsafe_set). This of course assumes that your code only accesses arrays correctly. By the way, mod is a relatively expensive operation. You could replace all occurrences of expr mod bound by let index = expr in if index < bound then index else index - bound (assuming the index never reaches 2*bound).

As said above, you won’t know whether your code can be improved or not without profiling. If you’re on Linux, I’d highly recommend experimenting with the perf tool to get a better idea of where your program spends its time.

With the exception of the -O* flags, the Flambda flags are only there to help you when you know what’s going wrong. Some of them aren’t even monotone (meaning that at some point, increasing their value can give you worse performance). So stick with -O3 until you’ve found a reason to increase one of the other parameters.

5 Likes

Thanks a lot for your detailed answer.
I’ll try all this stuff and tell you the difference.
For the eval function, eval i j is just a wrapper to get the distance between 2 cities, which is stored in a dist matrix.
Basically it’s

let create_eval foo = 
let dists = ...
in
fun i j -> dists.(i).(j)

I could avoid passing it as an argument by calling create_eval foo since I could pass foo instead of eval.
Do you think I should make my module a fonctor with [@@inline always] or just do what I just described or keep it the way it is?

If you only use one implementation of eval, you may as well just pass the distance matrix. Avoiding functors is never a bad idea.

1 Like

I just did a perf report as you suggested :

It seems like Sys.time() is the most consuming function in my program if I interpret caml_sys_time_include_children_unboxed correctly. Thanks A LOT for the input, you just saved my oral exam.