How to find the key of the largest element in a hash table

Hi, this may seem trivial, but after searching for an hour, I realized that few people asked this question before (at least in OCaml). Please correct me if I am doing it in a wrong way.

I am using Hashtbl from Core to build hash table now. Let’s say I have a hash table where key is the student name and value is his/her test score. How can I get the student name with highest score?

Some starter code may look like

let table = Hashtbl.create (module String);
Hashtbl.set table ~key:"Tom" ~data:90;
Hashtbl.set table ~key:"Jerry" ~data:97;
Hashtbl.set table ~key:"Bob" ~data:88;

We have to traverse the whole hash table to find the highest score, so I tried to use Hashtbl.fold. However I cannot figure out whether student name or score should be the accumulator in this case. Or is there any other approach to find the key of largest element?

Cheers!

You’re going to need to keep both the highest score (to compare to the previous highest as you go) and the student name as you fold. You keep them in a pair (max_score, max_name).

Alternatively, because hashtables are so fast, you could keep only the max_name and every time you compare to the next score you find, access the max_score that corresponds to max_name using the hashtable.

I happen to think the first option is more elegant, but it’s up to you.

1 Like

Possible implementation below (returns None if the table is empty):

let find_max (tbl : (string, int) Hashtbl.t) : (string * int) option =
  Hashtbl.fold tbl ~init:None ~f:(fun ~key:name ~data:score accu ->
      match accu with
      | None -> Some (name, score)
      | Some (name', score') ->
        if score > score' then Some (name, score) else accu
    )

Cheers,
Nicolas

3 Likes

That’s great, but the lambda function seems only accept one value as accumulator. The signature for Hashtbl.fold is below

('a, 'b) Hashtbl_intf.Hashtbl.t ->
init:'c -> f:(key:'a -> data:'b -> 'c -> 'c) -> 'c

Or maybe I can create a type like (max_name, max_score) as 'c in this case?

2 Likes

This makes sense to me! Cheers!

For the record, (max_score, max_name) is a tuple and as such a single value. The code from @nojb even uses a tuple option to signal that in the beginning there is no student with a highest score (and if the Hashtbl is empty, there might not be one at all).

1 Like