Closures, Inlining, performance optimization for anonymous functions

egoholic · April 24, 2018, 3:37pm

I have some performance-related questions that are pretty similar to each other so I decided to group them under a single title/topic.

Is there any performance difference between?

let rec __add t k v h = ...
let add t k v = __add t k v []

and:

let add t k v =
  let rec __add t k v h = ... in
  __add t k v []

Is there any performance difference between?

let add t k v =
  ...
  let find_and_add t = ... in List.iter find_and_add !(t.nodes)

and:

let add t k v =
  ...
  List.iter (fun t -> ....) !(t.nodes)

Is it possible to tell the compiler not to create a closure or does OCaml’s compiler skip closure creation when the function’s body doesn’t expose parent function’s scope?

Armael · April 24, 2018, 3:58pm

(the formatting of your code blocks seems broken)

bcc32 · April 24, 2018, 4:41pm

I haven’t tested this, but I believe that OCaml won’t create a closure if it doesn’t have to (i.e., if the function body doesn’t have any references to free variables).

(see this comment in Base, for example)

perry · April 24, 2018, 5:53pm

Formatting point: you want to enclose your code blocks in ``` to make them stand out.

As in

```
let a = 1
```

formats to

let a = 1

egoholic · April 24, 2018, 6:44pm

Thank you for explanation. Fixed.

blez · April 25, 2018, 6:35pm

The following is heavily caveat’ed with test, test and test preferably using something like core_bench.

While testing code I was trying to extract as much performance as possible from using ocaml-4.0.5 I made the following discoveries:

Complex functions defined within function scope are less optimised than those defined outside
Within function scope small functions that call a function using variables in the outer scope have no impact on optimisation. So, let foo x = let bar y = baz x y in... is fine
Partial function application isn’t well optimised, so let f x y = ... in let g = f 1 in g 2. is slower than f 1 2
Cross file optimisation is performed, especially in-lining. However, dragons be here! If the external module is overly complex the optimiser can give up even if the functions themselves are simple
Monads combined with functors (a common idiom) cost around 15% even for the identity monad (see later).
Closures are optimised out if possible, unfortunately this isn’t always: Monads seem to fit in this category
Imperative code using while and an FSM for example is slower than the functional version (no real surprise here)

Monad / functor example. Given,

module type IO = sig
  type 'a t
  val return : 'a -> 'a t
  val (>>=)  : 'a t -> ('a -> 'b t) -> 'b t
end

module Foo (IO : IO) = struct
  open IO
  let add1 x = x >>= fun y -> return (y + 1)
end

module IO = struct
  type 'a t = 'a
  let return x = x
  let (>>=) x f = f x
end

module Bar = Foo(IO);;
Bar.add1 1

The bind (>>=) adds about 15% overhead which stubbornly refused to be optimised no matter how I cut the code.

This is my experience with one library so reader beware! Also, if you are really looking for smoking performance head over to flambda.

n4323 · April 25, 2018, 8:11pm

So these observations are without flambda, correct? Have you tried with flambda and if yes, what was the difference?

blez · April 27, 2018, 3:19pm

This is without flambda. I haven’t tested with flambda as I am building a library for general use and most people don’t use flambda.

perry · April 27, 2018, 3:54pm

I have to confess I don’t understand why flambda isn’t available as a command line flag on the main compiler but only by a distinct executable. Is flambda’s output incompatible with the normal compiler?

n4323 · April 27, 2018, 4:13pm

AFAIU, that’s the goal but currently the flambda-enabled compiler is still substantially slower than the classic one even when used in the non-optimizing mode. The two compilers seem different enough that they are kept separate until flambda can fully replace the classic compiler. I may be wrong…

Topic		Replies	Views
Local functions and performance Learning optimization , performance	6	1622	April 11, 2022
Tail recursive loops and closures Learning performance	3	856	November 22, 2022
Why do Seq.fold_left and Seq.iter recurse on a closure rather than at the first level? Learning	2	727	February 6, 2020
Styles in defining functions Learning performance	3	579	June 7, 2022
How to avoid write barrier slowdown Learning	6	1517	June 11, 2018

Closures, Inlining, performance optimization for anonymous functions

Related topics