Functional long arrays

Arrays are mainly used in imperative programming. Functional programmers use lists (at least if they want pure functional programming).

It is not complicated to use arrays in a functional setting i.e. without mutability. Appending a new element at the end requires the allocation of a new array, the copying of the old elements and the setting of the last element. Nondestructive overwriting requires the copying of the old elements and setting a element at a certain position. However if the array has a certain size, the overhead eats up a lot of performance.

But arrays are indispensable if random access to elements is required. Accessing the ith element of a list requires an iteration over the list until the element is found.

In order to achieve a good performance for random access, insertion and deletion at one end and updating of inner elements in a functional setting, radix balanced arrays are a good choice. A radix balanced array splits up a long array into array chunks which fit into the cache line of modern computers. Within a cache line the moving of blocks of elements of an array is very fast. The chunks are stored within a tree structure such that random access is very fast as well.

The new version (0.4.0) of fmlib specifically the component fmlib_std has the datatype Rb_array which implements a radix balanced array an offers fast random access, fast insertion and deletion at the rear end and fast updating operations. All operations are completely functional i.e. the datastructure Rb_array is immutable like a list.

References:

18 Likes

Hi,

I benchmarked 15% speedup on update operations by replacing blit operations with a copy, to save the need to write twice in every array cell. Here are more efficient functions to use for functional arrays:

let replace (i: int) (x: 'a) (xs: 'a array): 'a array =
    let arr = Array.copy xs in
    arr.(i) <- x;
    arr

let remove (i: int) (xs: 'a array): 'a array =
    let len = length xs in
    Array.sub xs 0 (len - 1)

Another minor comment, your interface file reads:

The radix balanced array stores the information within array chunks which are not bigger than a cache line (usually 32 machine words or more on todays microprocessors).

But as far as I know, most cache lines are only 64 bytes, that is, 8 words, long.
It does not remove any of the benefits of using chunks, but just beware that a chunk is thus usually made of a least a few distinct cache lines.

Best,
Arthur

Hi Arthur. Thanks for the comments.

  1. I agree with your suggestion to use copy instead of the equivalent blit operations in the function replace. Even if it were not more efficient, the code becomes significantly clearer. So the benefit is more efficiency and clearer code.

  2. Your suggestion to use sub in the function remove does not seem to be correct. As far as I see, your suggestion removes the last element from the array and not the ith element. There is also a function remove_last in the library which is implemented exactly as you proposed.

  3. You are absolutely right with your remark concerning cache lines. A 32 word size
    cache line would need a cache line size of 256 bytes with a word size of 8 bytes. Usual cache line sizes seem to be 32, 64 or 128 bytes on modern computers. I don’t know from where I have got the information that cache line sizes are usually 32 machine words.

    If we assume a cache line size of 64 or 128 bytes and a word size of 8 bytes we either get 8 or 16 machine words into a cache line. I have chosen 32 as the size of the array chunks. Therefore an array chunk occupies 2 or 4 cache lines.

    For me the question remains whether a chunk size of 32 is a good compromise. The size has to be small enough to make array copy operations fast and big enough to limit the height of the trees. Any suggestions to improve the chunk size are welcome.

Cheers
Helmut

  • Sorry I got confused between remove and remove_last. Indeed, your remove_last is using sub as it should.

  • Regarding chunk size, I am currently working on benchmarks and mathematical analyses to attempt to characterize the best values, depending on the ratio between get
    and set operations. (François Pottier and I use chunks in the opam Sek package.)

@hbr Your design of long arrays is very convincing but it would be great to have a benchmark program in the repo to test it on my machine.

I think batteries has read-only arrays.
I’ve never used them though.
Cf. BatArray.Cap.
While that’s not purely functional, at least you can have a type-guaranteed read-only array,
if that’s really what you need.

@charguer Hi Arthur,

your work on the Sek package looks very interesting. I would be interested in any results you get and appreciate any information.

Regards
Helmut

@Kakadu I am not very skilled in writing benchmark programs. However if you have one (or some basic idea on how it should look like), I could include it in the repo.