Type for mutable sets

It looks like the Stdlib has no data structure for (mutable) sets ? Indeed, OCaml library : Set.Make represents (immutable) totally ordered sets.

Can I simply define

type 'a set = ('a, unit) Hashtbl.t

(see OCaml library : Hashtbl) and be content with it ? Or are there better ways ?

It depends a bit on why you need mutable sets. The Hashtbl option is perfectly fine, but the other obvious solution is to use Set.Make to create immutable sets and manipulate references to sets for mutability.

1 Like

Thanks. Set.make creates totally ordered sets, and there is no natural total order on the things I want to collect in a set. (Of course, I could put one, but this would feel arbitrary and superfluous.)

As for the general trick of “emulating” a mutable data structure by taking a reference to its immutable counterpart, it seems that it would hamper performance ? I mean: the performance advantage of mutability is that you do not have to recreate (reallocate?) a new structure each time you mutate it, but if you mutate a reference to an immutable data structure, you have to create a new contents field anyway, so this amounts to recreating an object of type the immutable data structure you started with, and nothing is gained ?

Relative performance will depend on which operations you use often. Sets represented as hash tables are fine if your hash function is fast and you don’t need the binary operations like union. For binary operations, the immutable representation is expected to allocate less in most cases.

1 Like

Not always. Keep in mind that a big advantage of immutable data structures is that they can use sharing. Typically, adding an element to a set isn’t much more expensive than adding an element to a list, regardless of the size of the set. This difference could also become quite important if you need backtracking: with a typical mutable structure (such as Hashtbl), if you want to save the current state for later you need a deep copy. If you use a reference instead this can be done by simply storing the current value of the reference, which is more or less free.

The argument about not having to reallocate is usually only true for big records, as my_record.f <- new_value can indeed be much less expensive than { my_record with f = new_value }. But sets are almost always going to be implemented by lots of small cells linked together (whether using mutable or immutable structures), so adding an extra reference cell on top of the structure doesn’t change much.

3 Likes