Fresh from a weekend of hacking, I would like to share some results of an experiment I conducted of creating a library for exposing Intel AVX2 intrinsics to OCaml code. AVX2 is an instruction set subset that adds data-parallel operations in hardware.
Given a type
Bigstring.t (1 dimensional byte arrays) there now exist functions such as -
val cmpeq_i8 : (t * int) -> (t * int) -> (t * int) -> unit
cmpeq_i8 (x,o1) (y,o2) (z,03) will compare 32 bytes starting at
y respectively and store the result in
This was mainly an exercise in curiosity. I just wanted to learn whether something like this is viable. I also want to see if adding some type-directed magic + ppx spells can let us write data parallel code much more naturally similar to what
lwt / async did for async code.
At the same time, you might ask - why not use something like Owl (which already has good support for data-parallel operations)? Apart from the fact that such libraries are oriented towards numerical code, I would also like to explore if we can operate directly on OCaml types and cast them into data parallel algorithms. Like how
simdjson pushed the boundaries of JSON parsing, it would be nice to port idiomatic code to data-parallel versions in OCaml. Can we, at some point, have generic traversals of data-types, which are actually carried out in a data-parallel fashion?
Does it work?
Given the limitation of the current implementation (due to foreign function calls into C), I still found some preliminary results to be interesting! Implementing the
String.index function, which returns the first occurence of a char, the runtime for finding an element at the
n-1 position in an array with
320000000 elements is -
serial: 1.12 seconds simd: 0.72 seconds (1.5x)
I still have to do the analysis what the overhead of the function call into C is (even with
It would be interesting to see, if we can create a representation which encapsulates the various SIMD ISA’s such as AVX2, AVX512, NEON, SVE etc. Further more, it would be really interesting to see if we can use ppx to automatically widen
map functions to operate on blocks of code, or automatically cast data types in a data parallel representation.
This was mostly a hobby project, so I cannot promise completing any milestones or taking feature requests etc. I definitely do not recommend using this in production, because of the lack of testing etc.