…presumably, this also applies to related functions.
@jsthomas has posted some interesting measurements of different ways to implement List.map
. Some points/questions:
- The overall goal is to have a tail-recursive
List.map
that won’t crash your whole program on a large list (the linked instance was seen on the mailing list just a few days ago!). - It shouldn’t be significantly slower than stdlib’s stack-unsafe
List.map
. That requires optimizations.
- There are at least three effective optimizations: (1) loop unrolling, (2) building the result list front-to-back using mutation, and (3) changing the implementation based on the size of the list.
- Containers and Base use (1) and (3), and Batteries uses (2).
-
@jsthomas was able to combine them all to get a
map
that looks faster than what’s now in Containers, Batteries, and Base. In particular, it maintains good performance for all list sizes. And, it’s likely possible to make @jsthomas’map
go even faster.
- All of Containers, Batteries, and Base are faster than stdlib for small lists. The advantage of Batteries is not as great as the other two, but only Batteries remains faster than stdlib on very large lists.
- Why don’t Containers and Base use mutation like in Batteries? Is it because the memory representations of the types involved aren’t guaranteed across different kinds of OCaml runtimes? Are the representations more likely to coincide if one uses 4.03 inline records to declare the mutable list type? Or perhaps performance on large lists is just not that important in practical situations?
- There is an interesting observation that making a non-tail call can be more expensive than allocating some heap space (in OCaml). Often, programmers don’t think of a function call as an allocation, but it is indeed allocating stack space. However, the full cost of the space should count deallocation, and stack space is reclaimed faster than garbage collection is done in the heap.