Looking for a specific data structure

Hi !

I’m looking for a data structure that matches the following conditions :

  • As fast as possible
  • Not necessarily ordered
  • Get intersection of two objects
  • Immutable
  • Get length of data strucure in O(1) (as fast as possible)

Indeed the purpose of such a data structure would be to count the number of common elements (which happen to be int) between multiple objects . On paper, Set seemed to be the perfect choice as Set.Make.intersect is actually really fast but it turns out Set.Make.cardinal is pretty slow when considering big Sets…

Thank you for helping me !

If you are happy with sets except for the non-constant time cardinal operation, you could memoize the cardinal computation, eg by considering pairs (set, cardinal). You will still need to pay the logarithmic cost of computing the cardinal, but only once per set.

Cheers,
Nicolas

Thank you for your answer ! This could be a good solution (and I actually use this trick somewhere else in my code) but the problem is that I can’t pre-compute the cardinal of all set intersections as there are more than a billion intersections possible in my case…

Base.Set returns length in O(1). base/src/set_intf.ml at ff1270481090f3f34db7d1d35d97144102bce4d0 · janestreet/base · GitHub

2 Likes

Thank you ! Indeed this allowed a way faster computation :+1:

grenier has a balmap library which seems to fit your requirements. If you don’t have base in your cone of dependency and want to keep it small, it might be worth checking.

1 Like