I have some performance-related questions that are pretty similar to each other so I decided to group them under a single title/topic.
Is there any performance difference between?
let rec __add t k v h = ...
let add t k v = __add t k v []
and:
let add t k v =
let rec __add t k v h = ... in
__add t k v []
Is there any performance difference between?
let add t k v =
...
let find_and_add t = ... in List.iter find_and_add !(t.nodes)
and:
let add t k v =
...
List.iter (fun t -> ....) !(t.nodes)
Is it possible to tell the compiler not to create a closure or does OCaml’s compiler skip closure creation when the function’s body doesn’t expose parent function’s scope?
I haven’t tested this, but I believe that OCaml won’t create a closure if it doesn’t have to (i.e., if the function body doesn’t have any references to free variables).
The following is heavily caveat’ed with test, test and test preferably using something like core_bench.
While testing code I was trying to extract as much performance as possible from using ocaml-4.0.5 I made the following discoveries:
Complex functions defined within function scope are less optimised than those defined outside
Within function scope small functions that call a function using variables in the outer scope have no impact on optimisation. So, let foo x = let bar y = baz x y in... is fine
Partial function application isn’t well optimised, so let f x y = ... in let g = f 1 in g 2. is slower than f 1 2
Cross file optimisation is performed, especially in-lining. However, dragons be here! If the external module is overly complex the optimiser can give up even if the functions themselves are simple
Monads combined with functors (a common idiom) cost around 15% even for the identity monad (see later).
Closures are optimised out if possible, unfortunately this isn’t always: Monads seem to fit in this category
Imperative code using while and an FSM for example is slower than the functional version (no real surprise here)
Monad / functor example. Given,
module type IO = sig
type 'a t
val return : 'a -> 'a t
val (>>=) : 'a t -> ('a -> 'b t) -> 'b t
end
module Foo (IO : IO) = struct
open IO
let add1 x = x >>= fun y -> return (y + 1)
end
module IO = struct
type 'a t = 'a
let return x = x
let (>>=) x f = f x
end
module Bar = Foo(IO);;
Bar.add1 1
The bind (>>=) adds about 15% overhead which stubbornly refused to be optimised no matter how I cut the code.
This is my experience with one library so reader beware! Also, if you are really looking for smoking performance head over to flambda.
I have to confess I don’t understand why flambda isn’t available as a command line flag on the main compiler but only by a distinct executable. Is flambda’s output incompatible with the normal compiler?
AFAIU, that’s the goal but currently the flambda-enabled compiler is still substantially slower than the classic one even when used in the non-optimizing mode. The two compilers seem different enough that they are kept separate until flambda can fully replace the classic compiler. I may be wrong…