Array.map vs List.map performance

All other things being equal, would you expect Array.map or List.map to be faster, given that the arguments have the same number of elements n?

As far as I can tell, this should be equivalent to “Is it faster to allocate many small regions of memory or one large region of memory”. One also has to traverse the input list, which is bad for cache locality.

Maybe the question depends on the size of n.

I would guess that Array.map is faster but this is just intuition, I would like to know if someone has benchmarked this or has internal knowledge of how OCaml allocates memory that would point in one direction or the other.

It depends.

In isolation, Array.map is usually a bit slower, as it involves GC barriers. But the time of the function itself is often less important than the impact on the rest of the code.

For very small values of n, lists could end up much better because the optimiser can track them (this is mostly relevant with the native compiler with flambda on). For instance, List.map f [x] can be optimised into [ f x ], removing the initial allocation for [x], and if you chain list-manipulating functions you could end up not actually allocating any lists except the last one. But that’s very limited: you need manual annotations ([@unrolled]) if you want that to work with lists bigger than 1, even at -O3.

For very large values of n arrays will likely be better due to the reduced memory footprint. Additionally, they have better cache locality both because of the more compact representation and the guarantee that all elements are next to each other, so when iterating on the result it will likely end up faster.

You should definitively run your own benchmarks. Not only between List and Array, but changing also the context (size, what kind of function is being mapped) can be very instructive.

1 Like

Thanks for the insight. Can you explain you mean by “GC barriers”?
I will definitely run my own test, I see what you mean about manipulating more of the variables to get a better understanding.

In the case of Array.map, the target array preexists any of the values (f src.(i)) that are being computed to fill it. Therefore, the array might be old enough to have been moved to the major heap while some of its content is still fresh enough to live on the minor heap. As such, the runtime needs to perform some extra work so that the values living on the minor heap are not incorrectly freed during the next garbage collection. This costly extra work is the “GC barrier”.

In the case of List.map, the new list cell is created right after the value that fills it has been computed, which means that younger values point to older values, and not the opposite. As such, no values can be mistakenly freed, so no extra work is need.

2 Likes

Interesting, thank you!

The answer will depend not only (strongly) on n but also on the element type and the function argument, particularly the allocation patterns of the function argument.

If you map (+) 1 over a collection of ints I’d expect the performance to be comparable except for an earlier jump to out-of-cache with List.map because it requires more memory.

If you map something that allocates a temporary over a collection of floats then I’d expect Array.map to be substantially faster.

A few years ago, I had a conversation with Jean-Christophe Filliatre, about this very subject. He was pretty firm on the position that for almost all the typical uses we make of lists, arrays are faster and more compact. My memory was that he had benchmarks on nontrivial codebases, behind this position, but I could be misremembering.

Separately, in 1992-4 while implementing Coq V5.10, I found that to represent (M N1 N2 ... Nk) (the iterated application of M to N1, then … Nk), representing the args as an array was much faster than as a list, and this was on the entire corpus of Coq’s benchmarks at the time.

ETA: There are two cases where arrays really win big: (1) when data is read more than consed, e.g. in a theorem-prover where you recurse over data a lot, Array.map incurs fewer pointer-follows than List.map, hence much faster; (2) arrays are much smaller (one cell instead of three), so heap-size savings.