Switch to inline record gain 25% speed up of querying in avl tree based map implementation

hackwaly · August 25, 2023, 2:38pm

By changing
from

type ('a, 'b) tree =
  | Empty
  | Node of ('a, 'b) tree * 'a * 'b * ('a, 'b) tree * int (* height *)

let rec find ~cmp t k =
  match t with
  | Empty -> None
  | Node (l, nk, nv, r, _) ->
    let cr = cmp k nk in
    if cr = 0 then Some nv else if cr < 0 then find ~cmp l k else find ~cmp r k
;;

to

type ('a, 'b) tree =
  | Empty
  | Node of
      { left : ('a, 'b) tree
      ; right : ('a, 'b) tree
      ; key : 'a
      ; value : 'b
      ; height : int
      }

let rec find ~cmp t k =
  match t with
  | Empty -> None
  | Node n ->
    let cr = cmp k n.key in
    if cr < 0
    then find ~cmp n.left k
    else if cr > 0
    then find ~cmp n.right k
    else Some n.value
;;

It seems ocaml failed to optimize the first one to move the field fetching(s) after the branches.

I noticed that the Map in stdlib also written in first way, which perform worse than latter.

nojb · August 25, 2023, 3:33pm

Interesting!

As you point out, the two codes are not equivalent; in the one using inline records you are doing the memory read later than in the one using tuples. If you were to destructure the record eagerly, ie

let rec find ~cmp t k =
  match t with
  | Empty -> None
  | Node {key; value; left; right; _} ->
    let cr = cmp k key in
    if cr < 0
    then find ~cmp left k
    else if cr > 0
    then find ~cmp right k
    else Some value

then both versions should perform equivalently. The advantage of using inline records is the lightweight “dot” syntax that allows accessing the fields of the block in a way that is not possible using tuples.

If the performance gain is as claimed, this would be a good improvement to submit upstream at GitHub - ocaml/ocaml: The core OCaml system: compilers, runtime system, base libraries.

Cheers,
Nicolas

amongonz · August 25, 2023, 4:04pm

How are you benchmarking them? Note that you have changed the order of comparisons between the two snippets, so branch prediction might be performing better in the second one, depending on the generated code. Try benchmarking the exact same code first.

hackwaly · August 25, 2023, 4:20pm

You are right. I move cr = 0 test in first one to last, it performs as fast as second one. So this is the part might need contribute to stdlib.

Topic		Replies	Views
Proposal for the replacement of Set and Map in the stdlib Community	6	559	April 3, 2025
Ocaml Record design Learning	6	1020	June 12, 2020
Stdlib.List loop unrolling Learning	11	970	January 28, 2023
Why is it five times slower to use mutable fields in records? Learning performance	4	796	December 30, 2020
Ocaml speed : recursive function optimization Learning ocaml	3	758	December 2, 2023

Switch to inline record gain 25% speed up of querying in avl tree based map implementation

Related topics