The read operations that you’re using introduce length checks. These length checks are not currently optimized away, so you pay a bunch of extra comparisons and a branch each time.
There are unsafe versions of the same primitives, but they might not be exported in the standard library (check ocplib-endian
, which likely defines them all).
Also, I just checked and Int64.equal
is implemented as compare x y = 0
, which is silly and slow. Use Bytes.get_int64_ne a (i + a_pos) = Bytes.get_int64_ne b (i + b_pos)
instead, you should see a noticeable boost.
Musl libc iterates on bytes, while the implementation in the OCaml runtime iterates on words, like your own implementation (see Delegate implementation of caml_string_equal to libc's memcmp by krtab · Pull Request #12427 · ocaml/ocaml · GitHub for a small relevant discussion).