Closures, Inlining, performance optimization for anonymous functions

I have some performance-related questions that are pretty similar to each other so I decided to group them under a single title/topic.

  1. Is there any performance difference between?
let rec __add t k v h = ...
let add t k v = __add t k v []

and:

let add t k v =
  let rec __add t k v h = ... in
  __add t k v []
  1. Is there any performance difference between?
let add t k v =
  ...
  let find_and_add t = ... in List.iter find_and_add !(t.nodes)

and:

let add t k v =
  ...
  List.iter (fun t -> ....) !(t.nodes)
  1. Is it possible to tell the compiler not to create a closure or does OCaml’s compiler skip closure creation when the function’s body doesn’t expose parent function’s scope?

(the formatting of your code blocks seems broken)

I haven’t tested this, but I believe that OCaml won’t create a closure if it doesn’t have to (i.e., if the function body doesn’t have any references to free variables).

(see this comment in Base, for example)

Formatting point: you want to enclose your code blocks in ``` to make them stand out.

As in

```
let a = 1
```

formats to

let a = 1

Thank you for explanation. Fixed.

The following is heavily caveat’ed with test, test and test preferably using something like core_bench.

While testing code I was trying to extract as much performance as possible from using ocaml-4.0.5 I made the following discoveries:

  • Complex functions defined within function scope are less optimised than those defined outside
  • Within function scope small functions that call a function using variables in the outer scope have no impact on optimisation. So, let foo x = let bar y = baz x y in... is fine
  • Partial function application isn’t well optimised, so let f x y = ... in let g = f 1 in g 2. is slower than f 1 2
  • Cross file optimisation is performed, especially in-lining. However, dragons be here! If the external module is overly complex the optimiser can give up even if the functions themselves are simple
  • Monads combined with functors (a common idiom) cost around 15% even for the identity monad (see later).
  • Closures are optimised out if possible, unfortunately this isn’t always: Monads seem to fit in this category
  • Imperative code using while and an FSM for example is slower than the functional version (no real surprise here)

Monad / functor example. Given,

module type IO = sig
  type 'a t
  val return : 'a -> 'a t
  val (>>=)  : 'a t -> ('a -> 'b t) -> 'b t
end

module Foo (IO : IO) = struct
  open IO
  let add1 x = x >>= fun y -> return (y + 1)
end

module IO = struct
  type 'a t = 'a
  let return x = x
  let (>>=) x f = f x
end

module Bar = Foo(IO);;
Bar.add1 1

The bind (>>=) adds about 15% overhead which stubbornly refused to be optimised no matter how I cut the code.

This is my experience with one library so reader beware! Also, if you are really looking for smoking performance head over to flambda.

So these observations are without flambda, correct? Have you tried with flambda and if yes, what was the difference?

This is without flambda. I haven’t tested with flambda as I am building a library for general use and most people don’t use flambda.

I have to confess I don’t understand why flambda isn’t available as a command line flag on the main compiler but only by a distinct executable. Is flambda’s output incompatible with the normal compiler?

AFAIU, that’s the goal but currently the flambda-enabled compiler is still substantially slower than the classic one even when used in the non-optimizing mode. The two compilers seem different enough that they are kept separate until flambda can fully replace the classic compiler. I may be wrong…