Best practice to build adjacency list

Tom_H · May 11, 2022, 12:35am

Hi there, I am trying to build a graph using adjacency list.

There may be duplicated edges in the source file, so my idea is to build a hash table where the key is node and data is a node set.

I will read source file line by line and build the hash table on fly. However, set is immutable and addition will lead to a huge overhead for rebuilding original set and bind the new set to hash table.

Is there any better idea to avoid this overhead? Or am I missing anything for the OCaml immutable mechanisms?

Cheers

ahem · May 11, 2022, 2:08am

Not sure how much this helps, but have you considered using the wonderful ocamlgraph library, instead of a hash table?

octachron · May 11, 2022, 6:56am

I am not sure what you mean by “huge overhead for rebuilding original set”. Adding a new element to a set is efficient.

cvine · May 11, 2022, 10:48am

To add to @octachron’s remark, I think you may have the wrong mental image of trees such as immutable sets. The principle of adding an item to a set is comparable to building a list by successively cons’ing items onto the (front of the) list. (There is obviously a little more to it with a self-balancing tree like a set, but the principle is there.)

dbuenzli · May 11, 2022, 12:45pm

To add to @cvine’s remark, a good way to get the right mental images of purely functional data structures is to read Chris Okasaki’s Purely Functional Data Structures book.

Tom_H · May 12, 2022, 2:01am

I am new to OCaml and functional programming, so I think it would be a good way to understand the language by implementing a graph by myself.

Still, thank you for the recommendation and I will check it!

Tom_H · May 12, 2022, 2:13am

I understand that adding an element to a set is O(log n) if it relies on a binary balanced tree.

However, the “huge overhead for rebuilding” I thought is that we need to deep copy the original set and then add the new element to the new set.

Please correct me if I misunderstand the “immutable” property for set here…

Tom_H · May 12, 2022, 2:32am

Thank you for pointing my mistake, but I am still confused about this “immutable” property.

Say am passing a set as an argument and call the function recursively as below

let rec list_to_set (s : IntSet) (l : int list)= 
match l with
[] -> s
| h::t -> IntSet.add s h;
          list_to_set s t;

If list l has 10 elements, so am I creating 10 sets during the recursion? Or each time I am passing s by reference instead of by value (so there is only one set)?

Tom_H · May 12, 2022, 2:34am

Cheers! I am going to check it

octachron · May 12, 2022, 6:59am

There is no need to build a deep copy of a set to add a new element. When adding a new element, we only need to create new nodes for the node on the path to the new elements, and those nodes share the unmodified subtree with the original node. Thus only, O(ln n) new nodes are created when adding a new element.

Also, the correct function to build a set from a list would be:

let rec list_to_set s l = match l with
| [] -> s
| h :: t -> list_to_set (Int_set.add h s) t

This function does create one set by element, but sets are cheap to create.

dbuenzli · May 12, 2022, 1:35pm

The diagrams in this section give a good picture of what @octachron is talking about. Notice the node sharing between the old and the new tree.

Tom_H · May 12, 2022, 11:22pm

Thank you for the diagram. It is much clear now!

Tom_H · May 12, 2022, 11:24pm

Thank you for the information! It makes sense to create O(ln n) new nodes in a new subtree.

It may be irrelevant, but I am quite confused why OCaml use binary tree to implement Set. I think hash table may be a better choice because Set does not care the order where BST may beat hash table. Also, the amortized cost O(1) for mem is much cheaper for hash table.

octachron · May 13, 2022, 8:04am

You can use Hashtbl as a set if you wish, but neither Set nor Hashtbl are strictly superior to each other.

In particular, Set is immutable and the mem function has a worse case complexity of O(ln n) compared to O(n) for Hashtbl. Set is thus often a safe, simple to reason about, and reasonably performant alternative to Hashtbl. But of course, the performance characteristic of Hashtbls (or the mutability) might make them a better fit for specific issues.

Topic		Replies	Views
Type for mutable sets Learning	4	696	May 31, 2022
Proposal for the replacement of Set and Map in the stdlib Community	6	553	April 3, 2025
Implemented a set class as a naive unordered one-linked-list Learning	3	382	January 24, 2023
Immutable Union-Find Structure Learning datastructures	4	2197	November 7, 2018
Scala Vector-like structure in OCaml Learning datastructures	11	5056	February 21, 2018

Best practice to build adjacency list

Related topics