Are functional B-Trees expected to be slow?

hummy123 · May 26, 2023, 1:19am

Hi everyone.

I’m in the process of implementing a functional/immutable B-Tree in OCaml and ran some small benchmarks against an unoptimised AVL Tree ported from a formal proof (don’t want to compare with the one in the stdlib since that may have optimisations).

To my surprise, the B-Tree, regardless of the branching factor I give it, is not much faster than the unoptimised AVL Tree at finding an element or an in-order traversal. It is significantly slower at insertions because of the array copying, but that is somewhat expected.

Here is my implementation so far. The benchmarks can be run with dune exec blibo from the repository root and the file I just linked is (I hope) short, well commented and readable.

Would anyone mind telling me if there is an inefficiency in my particular implementation or if these performance characteristics are to be expected? Thank you in advance.

Chet_Murthy · May 26, 2023, 3:16am

test/blibo.ml is empty ?

zeroexcuses · May 26, 2023, 3:57am

Are both of these in memory? If so, what is your intuition for in-memory B-tree beating in-memory AVL tree ?

In my limited understanding: the advantage of B-Trees debates back to when:

we had HDDs, where seek was expensive, sequential read was cheaper (compared to seek)
reads were atleast 4kb blocks

Thus, to minimize # of seeks, instead of having branching factor of 2, we wanted branching factor of (4kb / key size).

Now, in the case of everything fitting in memory; and I believe x86_64 cache sizes are 64 bytes? It’s not clear to me why B-tree would beat AVL trees.

hummy123 · May 26, 2023, 4:06am

The implementation is in early stages, so no test yet.

I’ve managed to verify the balancing invariant by inserting manual inputs and running dune fmt.

So:

Copy and paste the whole structure into utop.
You should have a test_tree variable in utop. Copy and paste it into blibo.ml (let _ = ...).
Run dune fmt. Now you can clearly see that all the nodes have the same height depth.

You can see an example of the above output in blibo.ml now starting from line 224. I don’t think maintaining the balancing invariant is a problem.

From what I’ve learned, the reason for the superior expected performance of B-Trees in RAM as well is due to cache locality. It’s like searching through an array vs searching through a linked list (both are O(n) but arrays are faster to search because it’s all contiguous memory in cache).

I think this may still be valid in principle because one of the benchmarks run on dune exec blibo is a fold/in-order traversal on both structures (an AVL Tree and a B-Tree with the same inputs) and the B-Tree takes 62% of the time the AVL Tree does for this, likely because of cache locality.

I’m surprised that finding the same key in both takes longer with the B-Tree though and at how much the insert operation slowed down. So I was wondering if there might be something wrong with my implementation (like creating unnecessary array copies or something) or if this was expected behaviour.

Topic		Replies	Views
B Trees in Ocaml via Fmlib_std 0.3.0 Community	19	1532	May 24, 2021
Proposal for the replacement of Set and Map in the stdlib Community	6	559	April 3, 2025
Ocaml speed : recursive function optimization Learning ocaml	3	758	December 2, 2023
Switch to inline record gain 25% speed up of querying in avl tree based map implementation Community	3	481	August 25, 2023
[ANN] First release of Art - Adaptive Radix Tree in OCaml Community announce , datastructures	3	1476	January 28, 2021

Are functional B-Trees expected to be slow?

Related topics