Optimizing small vector operations

The record version does not avoid any allocation in the function taking the record as argument, but it avoids allocations in the functions creating the record. (It can cause allocations in functions that consume its fields only to put them in an boxing context, but this is not the case here.) So overall you should see a decrease in allocations when using records (unless you are writing strange code with a lot of boxing contexts, or you are splitting into non-inlined functions in a fine-grained ways that inserts spurious boxing contexts), but not in the function itself.

Note: it’s pretty easy to understand where OCaml needs to box/unbox floats, and where we hope that it will not:

  • floats are stored boxed in tuples, unboxed in records with all-float fields (but not other records) or arrays (unless you are in a -no-flat-float-array switch) or Floatarray.t (robustly)
  • floating-point computations are expressed as taking boxed arguments and returning boxed results, but in practice the optimizer will eliminate intermediary boxing

For example dx *. dx +. dy *. dy will never box the intermediate results of *. (but it will probably box the result of +. as a function boundary, and whether dx and dy will be passed boxed or unboxed depends on the optimizer).

Some stuff about floating-point performance is tricky and hard to predict, so you can find surprises if you micro-optimize code, but the broad principles above should give you a simple, clear mental model to write efficient floating-point code in most cases.

3 Likes