Best practice to build adjacency list

Hi there, I am trying to build a graph using adjacency list.

There may be duplicated edges in the source file, so my idea is to build a hash table where the key is node and data is a node set.

I will read source file line by line and build the hash table on fly. However, set is immutable and addition will lead to a huge overhead for rebuilding original set and bind the new set to hash table.

Is there any better idea to avoid this overhead? Or am I missing anything for the OCaml immutable mechanisms?

Cheers

Not sure how much this helps, but have you considered using the wonderful ocamlgraph library, instead of a hash table?

I am not sure what you mean by “huge overhead for rebuilding original set”. Adding a new element to a set is efficient.

1 Like

To add to @octachron’s remark, I think you may have the wrong mental image of trees such as immutable sets. The principle of adding an item to a set is comparable to building a list by successively cons’ing items onto the (front of the) list. (There is obviously a little more to it with a self-balancing tree like a set, but the principle is there.)

1 Like

To add to @cvine’s remark, a good way to get the right mental images of purely functional data structures is to read Chris Okasaki’s Purely Functional Data Structures book.

2 Likes

I am new to OCaml and functional programming, so I think it would be a good way to understand the language by implementing a graph by myself.

Still, thank you for the recommendation and I will check it!

I understand that adding an element to a set is O(log n) if it relies on a binary balanced tree.

However, the “huge overhead for rebuilding” I thought is that we need to deep copy the original set and then add the new element to the new set.

Please correct me if I misunderstand the “immutable” property for set here…

Thank you for pointing my mistake, but I am still confused about this “immutable” property.

Say am passing a set as an argument and call the function recursively as below

let rec list_to_set (s : IntSet) (l : int list)= 
match l with
[] -> s
| h::t -> IntSet.add s h;
          list_to_set s t;

If list l has 10 elements, so am I creating 10 sets during the recursion? Or each time I am passing s by reference instead of by value (so there is only one set)?

Cheers! I am going to check it :wink:

There is no need to build a deep copy of a set to add a new element. When adding a new element, we only need to create new nodes for the node on the path to the new elements, and those nodes share the unmodified subtree with the original node. Thus only, O(ln n) new nodes are created when adding a new element.

Also, the correct function to build a set from a list would be:

let rec list_to_set s l = match l with
| [] -> s
| h :: t -> list_to_set (Int_set.add h s) t

This function does create one set by element, but sets are cheap to create.

1 Like

The diagrams in this section give a good picture of what @octachron is talking about. Notice the node sharing between the old and the new tree.

1 Like

Thank you for the diagram. It is much clear now!

Thank you for the information! It makes sense to create O(ln n) new nodes in a new subtree.

It may be irrelevant, but I am quite confused why OCaml use binary tree to implement Set. I think hash table may be a better choice because Set does not care the order where BST may beat hash table. Also, the amortized cost O(1) for mem is much cheaper for hash table.

You can use Hashtbl as a set if you wish, but neither Set nor Hashtbl are strictly superior to each other.

In particular, Set is immutable and the mem function has a worse case complexity of O(ln n) compared to O(n) for Hashtbl. Set is thus often a safe, simple to reason about, and reasonably performant alternative to Hashtbl. But of course, the performance characteristic of Hashtbls (or the mutability) might make them a better fit for specific issues.

2 Likes