Help with performance regression


#1

Hi

I’ve noticed that the following code is causing a large number of allocations and slowing down my application as it is called in a hot path:

let maybe_timeout ~timeout f =
  if Float.(timeout <= 0.0)
  then f ()
  else Lwt_unix.with_timeout timeout f

However, I’ve noticed that if I ignore the value of timeout provided and hard code it the issue goes away:

let maybe_timeout ~timeout f =
  if Float.(timeout <= 0.0)
  then f ()
  else Lwt_unix.with_timeout 5.0 f

Can anyone explain this behavior?

Thanks
Ryan

PS: This is under 4.07.1


#2

It’s hard to be sure without context, but my guess would be that maybe_timeout is inlined at the callsite, and that the compiler is able to unbox the float value that you pass as timeout argument. It remains unboxed in the body of the function, until the external function Lwt_unix.with_timeout is called, as it expects this argument boxed – boxing then allocates at each call.

In the second, hard-coded version, 5.0 is a shared constant (as a boxed float), so no allocation occurs due to boxing.

You might be able to verify that claim by using an [@@inline never] attribute on the maybe_timeout declaration, which should enforce boxing at its callsite and make the performance of both versions consistent (they allocate as much). I don’t remember whether only flambda honors this attribute, or also the non-flambda middle-end. (And I don’t know whether you use flambda or not.)

Another way to verify the hypothesis would be to force boxing of the value passed to timeout before the call. I guess that, for example, calling print_float on that value (before passing it to timeout) could work.


#3

Thanks for your help.

After much more digging it appears that the issue is actually caused by a bug in my code that becomes worse the larger the timeout value is. When hard coding the value above I made the mistake of using a small value which meant that the issue didn’t appear.