Vectorized operations for float arrays or bigarrays of floats

UnixJunkie · April 4, 2023, 8:44am

Is there anyone who ever implemented vectorized
operations for float arrays or Bigarray.Array1 of floats?
I guess, via some C code calling specialised “multimedia” CPU instructions.

In some simulation code, translating a molecule is just adding
a small constant to all X values, another constant to Y values, another constant to Z values
(if we have a struct-of-arrays architecture).
So, at some point, I might be interested by such optimizations.

If you have another suggestion (like using the OCaml bindings to BLAS/LAPACK or I don’t know what yet), just let me know.

Thanks,
F.

n4323 · April 4, 2023, 10:05am

Translating all atoms at once sounds like it could benefit from array-of-structs and then calling daxpy from BLAS via lacaml? One probably needs to weigh the benefit with potentially more cumbersome access to atoms.

nojb · April 4, 2023, 10:10am

At LexiFi we use 1-dimension bigarrays for most of our numerical code. As far as vectorization is concerned, we mostly use two techniques:

We have many small C primitives working on arrays of doubles, eg
```
void mlfi_mul(int size, double *src1, double *src2, double *dst) {
   for (int k = 0; k < size; k ++) {
        dst[k] = src1[k] * src2[k];
    }
}
```
and then rely on GCC’s auto-vectorizer (enabled when compiling with -O3 or by passing -ftree-vectorize). But note that some vectorizations require specific processor extensions and may fail at runtime if the processor does not support the necessary instructions. At LexiFi we actually compile the same code for several instructions sets and do a runtime check to decide what code to use. But depending on your needs, you may not have to go through this extra complication and just assume that your target processor implements the necessary instructions.
We also wrote bindings for the MKL library which insulates you from the different instruction sets; and my impression (but am not an expert) is that it brings some more advanced optimizations than what you can get naïvely from GCC.

Cheers,
Nicolas

jfeser · April 4, 2023, 2:52pm

I’ve written a vectorized bitvector library (GitHub - jfeser/bitarray: Fast vectorized bitarrays for OCaml). It uses ISPC to implement the kernels (although gcc does a great job with simple kernels). ISPC deals with the vector instructions, and it can emit code that selects the right kernel at runtime, depending on which instructions are available. Doing something similar with float arrays should be straightforward.

All that said, for the uses you describe, BLAS sounds like the right choice.

girzel · April 4, 2023, 3:26pm

This sounds like exactly what Owl is for, doesn’t it? It is built on BLAS/LAPACK.

UnixJunkie · April 5, 2023, 1:25am

So, even LexiFi people use k++ when they mean ++k ?

UnixJunkie · April 5, 2023, 1:26am

Did you release a library exposing such bigarray accelerated operations?
I would be happy to give it a try.

UnixJunkie · April 5, 2023, 1:30am

Antoine Mine suggested me to use C code working on bigarrays, as you do, and gcc options
such as -O3, or even -O3 -mtune=native -march=native.
This sounds like a very simple and pragmatic suggestion.

UnixJunkie · April 5, 2023, 1:32am

Owl is a rather large dependency. I know we have bindings to BLAS/LAPACK due to Markus Mottl, I could use those directly and not require Owl.

UnixJunkie · April 5, 2023, 1:34am

MKL is not free, right?

nojb · April 5, 2023, 7:31am

Yes, it is free (including for commercial use), but not open-source.

Cheers,
Nicolas

bluddy · April 5, 2023, 8:54am

Owl_base is built exactly for this purpose.

scemama · April 11, 2023, 9:09am

I use lacaml, which is easy to use and install, with very few dependencies.
Lacaml can use MKL by setting environment variables when compiling. This worked 2 years ago (but I don’t know if it still works):

export LACAML_LIBS="-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_rt -lpthread -lm -ldl"
opam install lacaml

anmolsahoo25 · April 12, 2023, 8:29pm

I had shared some AVX2 intrinsics directly exposed in the library here - bigstringaf/bigstringaf_simd.mli at master · anmolsahoo25/bigstringaf · GitHub

I haven’t used the library in a while + I think it was mentioned that since the function signature has tuples, it would lead to unnecessary allocations. I will probably revisit the library sometime in the future.

Topic		Replies	Views
Migrating to floatarray (blog post) Community announce	9	1611	June 16, 2023
Some SIMD in your OCaml Community	29	4880	September 15, 2020
`float array` and `floatarray` have similar performance in my benchmark Learning	9	797	February 14, 2023
Applied numerical algebra, and type systems Ecosystem	18	1095	August 7, 2022
Is there an existing library for aliased array slices (like SML's ArraySlice)? Learning	4	876	November 19, 2019

Vectorized operations for float arrays or bigarrays of floats

Related topics