At LexiFi we recently migrated our codebase to use floatarray in place of float array in order to disable the “flat float array” mode in the compiler. If you are interested in finding out more about how we did it, we wrote a blog post about it Migrating to floatarray: experience report | LexiFi. Enjoy!
I still don’t understand what floatarray brought to the party – or rather – to the OCaml array orgy.
Why didn’t you simply migrate to 1D bigarrays of floats ?
IIRC @xavierleroy’s idea with the float array optimization is that one could take any textbook numerical algorithm and implement it with OCaml arrays right away in a natural way with reasonable performance (i.e. no boxing).
If you disable the “flat float array” mode you lose that property but maybe get gains elsewhere. Fine.
But why the heck was yet another specialized array datastructure introduced in the stdlib ? Didn’t we have enough with bigarrays of floats ?
All these different choices of arrays (string, bytes, array, floatarray, bigarray) is extremely annoying for writing libraries.
Why not going to float bigarrays?
The cool thing is that you can select the size of the floats your are using.
I use this in one program and use 32b floats for large arrays so that they occupy less space.
PS: I do as Daniel suggested; always 1D even if the array is multidimensional
We already use 1D bigarrays in a number of places, mostly to interface with C. However, I understand that bigarrays are less efficient than floatarrays: bigarrays are allocated in the C heap, while floatarrays are allocated in the OCaml heap (the latter has faster allocation, is much better at avoiding fragmentation, etc). Also, reads and writes to bigarrays require an extra memory indirection. All this may not matter 99% of the time, but apparently makes a difference for those writing high-performance numerical code which use a lot of short-lived float arrays.
Of course, these differences are observable in small experiments and benchmarks, but we had no way to test the two against each other in real-life code and at scale (as we had no easy way to rewrite the codebase to do so), so I cannot say for sure how much of a regression switching for 1D bigarrays would have represented “globally”.
Do you have an idea of which kind of code did that ?
I suspect a lot of the short-lived float data is in the realm of small vector data (points, matrices etc.) for which you can use fixed size records of float fields – that’s even better, you don’t get bound checks.
Large stuff whether scalar, raster or vector data you will need bigarrays anyways as you are unlikely to skip the chance to use your GPU or blas/lapack.
Basically my impression is that the introduction of floatarray was a “cover your ass” move for those willing to use --disable-flat-float-array.
And we are not over, we’ll soon get twice boxed dynarrays – the more the better. I look forward to the introduction of floatdynarray and uchardynarray.
I think more thought should be given to these things and maybe less in the name of performance but of usability. JavaScript may have a horrible array representation but in the end its a versatile datastructure which gets a nicer eco-system to work with (note though that they also do have bigarrays nowadays).
In the end I wonder if @xavierleroy’s first call, which as far as I remember mostly seemed to annoy compiler devs was not a much better idea.
What is the long term plan here ? To default to --disable-flat-float-array ?
I don’t have anything concrete to show. I will see if I can convince one of our quantitative developers to explain this point a bit more and if there is anything of interest I will post back here.
This question actually came up during the last dev meeting, and my recollection of the discussion is that no, there is no long term plan to default to --disable-flat-float-array. In other words, the long term plan seems to be to stick to the status quo.
That’s besides the point. You can always add more types to a system, that doesn’t necessarily make it better.
The problem is which type APIs agree on to interchange data so that one doesn’t have to constantly handle representation mismatches.
In this particular case it’s unlikely going to be that new Dynarray proposal since it boxes even more than arrays do. For most of my dynarrays uses a few convenience combinators to easily extend regular arrays would have gone a long way.