Hi everyone.
I’m in the process of implementing a functional/immutable B-Tree in OCaml and ran some small benchmarks against an unoptimised AVL Tree ported from a formal proof (don’t want to compare with the one in the stdlib since that may have optimisations).
To my surprise, the B-Tree, regardless of the branching factor I give it, is not much faster than the unoptimised AVL Tree at finding an element or an in-order traversal. It is significantly slower at insertions because of the array copying, but that is somewhat expected.
Here is my implementation so far. The benchmarks can be run with dune exec blibo
from the repository root and the file I just linked is (I hope) short, well commented and readable.
Would anyone mind telling me if there is an inefficiency in my particular implementation or if these performance characteristics are to be expected? Thank you in advance.
Are both of these in memory? If so, what is your intuition for in-memory B-tree beating in-memory AVL tree ?
In my limited understanding: the advantage of B-Trees debates back to when:
- we had HDDs, where seek was expensive, sequential read was cheaper (compared to seek)
- reads were atleast 4kb blocks
Thus, to minimize # of seeks, instead of having branching factor of 2, we wanted branching factor of (4kb / key size).
Now, in the case of everything fitting in memory; and I believe x86_64 cache sizes are 64 bytes? It’s not clear to me why B-tree would beat AVL trees.
1 Like
The implementation is in early stages, so no test yet.
I’ve managed to verify the balancing invariant by inserting manual inputs and running dune fmt
.
So:
- Copy and paste the whole structure into utop.
- You should have a test_tree variable in utop. Copy and paste it into blibo.ml (
let _ = ...
).
- Run
dune fmt
. Now you can clearly see that all the nodes have the same height depth.
You can see an example of the above output in blibo.ml now starting from line 224. I don’t think maintaining the balancing invariant is a problem.
From what I’ve learned, the reason for the superior expected performance of B-Trees in RAM as well is due to cache locality. It’s like searching through an array vs searching through a linked list (both are O(n)
but arrays are faster to search because it’s all contiguous memory in cache).
I think this may still be valid in principle because one of the benchmarks run on dune exec blibo
is a fold/in-order traversal on both structures (an AVL Tree and a B-Tree with the same inputs) and the B-Tree takes 62% of the time the AVL Tree does for this, likely because of cache locality.
I’m surprised that finding the same key in both takes longer with the B-Tree though and at how much the insert operation slowed down. So I was wondering if there might be something wrong with my implementation (like creating unnecessary array copies or something) or if this was expected behaviour.