Hi there, I am trying to build a graph using adjacency list.
There may be duplicated edges in the source file, so my idea is to build a hash table where the key is node and data is a node set.
I will read source file line by line and build the hash table on fly. However, set is immutable and addition will lead to a huge overhead for rebuilding original set and bind the new set to hash table.
Is there any better idea to avoid this overhead? Or am I missing anything for the OCaml immutable mechanisms?
To add to @octachron’s remark, I think you may have the wrong mental image of trees such as immutable sets. The principle of adding an item to a set is comparable to building a list by successively cons’ing items onto the (front of the) list. (There is obviously a little more to it with a self-balancing tree like a set, but the principle is there.)
Thank you for pointing my mistake, but I am still confused about this “immutable” property.
Say am passing a set as an argument and call the function recursively as below
let rec list_to_set (s : IntSet) (l : int list)=
match l with
[] -> s
| h::t -> IntSet.add s h;
list_to_set s t;
If list l has 10 elements, so am I creating 10 sets during the recursion? Or each time I am passing s by reference instead of by value (so there is only one set)?
There is no need to build a deep copy of a set to add a new element. When adding a new element, we only need to create new nodes for the node on the path to the new elements, and those nodes share the unmodified subtree with the original node. Thus only, O(ln n) new nodes are created when adding a new element.
Also, the correct function to build a set from a list would be:
let rec list_to_set s l = match l with
| [] -> s
| h :: t -> list_to_set (Int_set.add h s) t
This function does create one set by element, but sets are cheap to create.
Thank you for the information! It makes sense to create O(ln n) new nodes in a new subtree.
It may be irrelevant, but I am quite confused why OCaml use binary tree to implement Set. I think hash table may be a better choice because Set does not care the order where BST may beat hash table. Also, the amortized cost O(1) for mem is much cheaper for hash table.
You can use Hashtbl as a set if you wish, but neither Set nor Hashtbl are strictly superior to each other.
In particular, Set is immutable and the mem function has a worse case complexity of O(ln n) compared to O(n) for Hashtbl. Set is thus often a safe, simple to reason about, and reasonably performant alternative to Hashtbl. But of course, the performance characteristic of Hashtbls (or the mutability) might make them a better fit for specific issues.