In Core, how does the `compare` function (of keys) affect the performance of a hash table?

Hi,

I’m using the module Hashtbl of Core to create a hash table whose key is of the type Llvm.llvalue. Here is the function Hashtbl.create:

let create ?growth_allowed ?size m =
  create ~hashable:(Hashable.of_key m) ?growth_allowed ?size ()
;;

In my code, I need to define a wrapping module Llvalue of Llvm.llvalue, and call Hashtbl.create (module Llvalue). The module Llvalue is like follow:

module Llvalue = struct
  type t = Llvm.llvalue

  let hash = Hashtbl.hash
  let sexp_of_t = LL.string_of_llvalue
  let compare = ???             (* how should I define `compare`? *)
end

Since Llvm does not provide a built-in compare function, how should I define it to optimize the performance of the hash table?

My simple solution is to convert Llvm.llvalue to strings, and compare the output strings, but I think it might not be the best solution.

Thanks for spending your time to read my question!

Beware that Llvm.string_of_llvalue is incredibly slow on instructions and functions.

Do you need something coarser than pointer comparison?

I use polymorphic compare (Poly.compare in base) on llvalues. But I expect that will stop working soon since llvalues are naked pointers.

@jjb: Thanks for the suggestion on Poly.compare.

Previously, I also use Hashtbl.Poly.create (), which may be similar to your approach, where I don’t have to manually define the compare function for the wrapping module Llvalue.

I only need a coarse implementation of compare so that I can implement Hashtbl and Set of Llvm.llvalue, without having to use Poly.

Do you have any other suggestion?

Yes, Hashtbl.Poly uses the polymorphic compare, so explicitly using Poly.compare will use the same comparison function. If this is very hot, there might be some magic hacks to cast these pointers to integers, but I won’t advise doing that.

Just don’t call string_of_llvalue, it constructs the whole debug info table for the whole enclosing compilation unit (llmodule) and throws it away for each instruction or function.

On second thought, my mentioning naked pointers was an irrelevant tangent. The llvm bindings will need to be rewritten/overhauled to work in --no-naked-pointers mode anyhow.