Closures, Inlining, performance optimization for anonymous functions

I have some performance-related questions that are pretty similar to each other so I decided to group them under a single title/topic.

  1. Is there any performance difference between?
let rec __add t k v h = ...
let add t k v = __add t k v []

and:

let add t k v =
  let rec __add t k v h = ... in
  __add t k v []
  1. Is there any performance difference between?
let add t k v =
  ...
  let find_and_add t = ... in List.iter find_and_add !(t.nodes)

and:

let add t k v =
  ...
  List.iter (fun t -> ....) !(t.nodes)
  1. Is it possible to tell the compiler not to create a closure or does OCaml’s compiler skip closure creation when the function’s body doesn’t expose parent function’s scope?

(the formatting of your code blocks seems broken)

I haven’t tested this, but I believe that OCaml won’t create a closure if it doesn’t have to (i.e., if the function body doesn’t have any references to free variables).

(see this comment in Base, for example)

1 Like

Formatting point: you want to enclose your code blocks in ``` to make them stand out.

As in

```
let a = 1
```

formats to

let a = 1
1 Like

Thank you for explanation. Fixed.

The following is heavily caveat’ed with test, test and test preferably using something like core_bench.

While testing code I was trying to extract as much performance as possible from using ocaml-4.0.5 I made the following discoveries:

  • Complex functions defined within function scope are less optimised than those defined outside
  • Within function scope small functions that call a function using variables in the outer scope have no impact on optimisation. So, let foo x = let bar y = baz x y in... is fine
  • Partial function application isn’t well optimised, so let f x y = ... in let g = f 1 in g 2. is slower than f 1 2
  • Cross file optimisation is performed, especially in-lining. However, dragons be here! If the external module is overly complex the optimiser can give up even if the functions themselves are simple
  • Monads combined with functors (a common idiom) cost around 15% even for the identity monad (see later).
  • Closures are optimised out if possible, unfortunately this isn’t always: Monads seem to fit in this category
  • Imperative code using while and an FSM for example is slower than the functional version (no real surprise here)

Monad / functor example. Given,

module type IO = sig
  type 'a t
  val return : 'a -> 'a t
  val (>>=)  : 'a t -> ('a -> 'b t) -> 'b t
end

module Foo (IO : IO) = struct
  open IO
  let add1 x = x >>= fun y -> return (y + 1)
end

module IO = struct
  type 'a t = 'a
  let return x = x
  let (>>=) x f = f x
end

module Bar = Foo(IO);;
Bar.add1 1

The bind (>>=) adds about 15% overhead which stubbornly refused to be optimised no matter how I cut the code.

This is my experience with one library so reader beware! Also, if you are really looking for smoking performance head over to flambda.

1 Like

So these observations are without flambda, correct? Have you tried with flambda and if yes, what was the difference?

This is without flambda. I haven’t tested with flambda as I am building a library for general use and most people don’t use flambda.

I have to confess I don’t understand why flambda isn’t available as a command line flag on the main compiler but only by a distinct executable. Is flambda’s output incompatible with the normal compiler?

AFAIU, that’s the goal but currently the flambda-enabled compiler is still substantially slower than the classic one even when used in the non-optimizing mode. The two compilers seem different enough that they are kept separate until flambda can fully replace the classic compiler. I may be wrong…