Functional long arrays

Arrays are mainly used in imperative programming. Functional programmers use lists (at least if they want pure functional programming).

It is not complicated to use arrays in a functional setting i.e. without mutability. Appending a new element at the end requires the allocation of a new array, the copying of the old elements and the setting of the last element. Nondestructive overwriting requires the copying of the old elements and setting a element at a certain position. However if the array has a certain size, the overhead eats up a lot of performance.

But arrays are indispensable if random access to elements is required. Accessing the ith element of a list requires an iteration over the list until the element is found.

In order to achieve a good performance for random access, insertion and deletion at one end and updating of inner elements in a functional setting, radix balanced arrays are a good choice. A radix balanced array splits up a long array into array chunks which fit into the cache line of modern computers. Within a cache line the moving of blocks of elements of an array is very fast. The chunks are stored within a tree structure such that random access is very fast as well.

The new version (0.4.0) of fmlib specifically the component fmlib_std has the datatype Rb_array which implements a radix balanced array an offers fast random access, fast insertion and deletion at the rear end and fast updating operations. All operations are completely functional i.e. the datastructure Rb_array is immutable like a list.

References:

17 Likes

Hi,

I benchmarked 15% speedup on update operations by replacing blit operations with a copy, to save the need to write twice in every array cell. Here are more efficient functions to use for functional arrays:

let replace (i: int) (x: 'a) (xs: 'a array): 'a array =
    let arr = Array.copy xs in
    arr.(i) <- x;
    arr

let remove (i: int) (xs: 'a array): 'a array =
    let len = length xs in
    Array.sub xs 0 (len - 1)

Another minor comment, your interface file reads:

The radix balanced array stores the information within array chunks which are not bigger than a cache line (usually 32 machine words or more on todays microprocessors).

But as far as I know, most cache lines are only 64 bytes, that is, 8 words, long.
It does not remove any of the benefits of using chunks, but just beware that a chunk is thus usually made of a least a few distinct cache lines.

Best,
Arthur