Local functions and performance

vlaviron · April 4, 2022, 9:10am

In bytecode, all versions should be more or less equivalent (closures are never optimised away).

In native code, the first version allocates a closure everytime you call my_public_function, while the second will have its closure statically allocated if it doesn’t depend on other non-static variables.

The performance of my_private_function itself could be impacted too: the first version has one less parameter so it should be slightly faster to call, but that is offset by an optimisation on the second version: the extra parameter passed to each function that stores its closure can be removed for functions that do not need it, so in the end they will both take two arguments. The first version also requires an extra load from the closure to get some_argument (the load occurs once with Flambda, at each use otherwise).

Finally, a non-trivial question is how all of this interacts with the compiler’s optimisations. The function my_public_function is rather small, so it could be considered for inlining, but without Flambda only the third version (with my_private_function out of the body of my_public_function) can be inlined. With Flambda, the second and third version are equivalent, and the first one could be actually easier to optimise, in particular if you call my_public_function with a constant or statically-allocated argument.

Topic		Replies	Views
Closures, Inlining, performance optimization for anonymous functions Learning performance , closure	9	1945	April 27, 2018
Tail recursive loops and closures Learning performance	3	839	November 22, 2022
Question on mutual recursion Learning	1	157	November 17, 2024
Why do Seq.fold_left and Seq.iter recurse on a closure rather than at the first level? Learning	2	724	February 6, 2020
Is this optimized by the OCaml compiler? Learning	9	768	January 12, 2023

Local functions and performance

Related topics