Optimizing small vector operations

n4323 · January 26, 2021, 4:15pm

Another timing update:

Fully without landmarks, the full-program run-time differences reported earlier between the _d2', _d2'', _d2''' implementations seem to disappear (tuple version).
Fully without landmarks, the full-program run-time for the record version is marginally below that in the bullet point above. This is the first time that records appear to be a little faster, as expected.

So part of the surprising results I posted above now look like user error: Preprocessing with landmarks interfered with inlining (even though it’s not activated at runtime). Sorry for the noise regarding that!

[edit]
For those using Landmarks, I found that a convenient way to use it in manual mode is to insert manual landmarks L.enter and L.exit points at few, not too-small functions, then define a do-nothing replacement module:

module Landmark_off = struct
  let register _ = ()
  let enter _ = ()
  let exit _ = ()
end

and switch profiling on/off like so:

module L =
  Landmark
  (*Landmark_off*)

This will completely eliminate calls to Landmarks after optimization including potential inlining obstructions as far as i can tell.

Topic		Replies	Views
How to speed up this function? Learning	33	3586	October 25, 2025
Returning large tuples vs records Learning performance	5	318	June 4, 2025
Using Ocaml to write network interface drivers Community	24	4887	February 14, 2020
Significant performance difference between OCaml and F# Ecosystem	53	19421	July 9, 2022
[help]Improving the performance of this code Learning	33	1228	March 24, 2023

Optimizing small vector operations

Related topics