A few things strike me when looking at your code:
- You’re doing regular calls to
Sys.time
. Those are not cheap. It’s hard to know how much of your time is spent doing these calls, but I’d suggest benchmarking without them (on examples that do not timeout) to get an idea. - You’re doing indirect calls to the
eval
function. I’m not sure what this function is supposed to do, but if it’s only supposed to do a small computation you’re going to lose a lot of time due to the indirect calls. You might want to experiment with turning your module into a functor taking theeval
function as argument, and annotating this functor with[@@inline always]
. This should create specialised versions of your functions, that know whicheval
function will be called and can inline it if needed. - You’re doing quite a lot of array accesses. It’s possible that you could save non-trivial amounts of time by using the unsafe versions of array accesses (
Array.unsafe_get
andArray.unsafe_set
). This of course assumes that your code only accesses arrays correctly. By the way,mod
is a relatively expensive operation. You could replace all occurrences ofexpr mod bound
bylet index = expr in if index < bound then index else index - bound
(assuming the index never reaches2*bound
).
As said above, you won’t know whether your code can be improved or not without profiling. If you’re on Linux, I’d highly recommend experimenting with the perf
tool to get a better idea of where your program spends its time.
With the exception of the -O*
flags, the Flambda flags are only there to help you when you know what’s going wrong. Some of them aren’t even monotone (meaning that at some point, increasing their value can give you worse performance). So stick with -O3
until you’ve found a reason to increase one of the other parameters.