Idiomatic group_by/distinct_by/count_by

I love using pipelines of combinators to munge data but OCaml’s stdlib is missing some obvious ones such as group_by/distinct_by/count_by. You can write them easily enough in vanilla OCaml using Hashtbl, e.g. Array.distinct_by:

let distinct_by f xs =
  let seen = Hashtbl.create 16 in
  let result = ExtArray.empty() in
  xs
  |> Array.iter (fun x ->
    let fx = f x in
    if Hashtbl.mem seen fx then () else
      ( Hashtbl.add seen fx ();
        ExtArray.append result x ));
  ExtArray.to_array result

but that relies upon polymorphic equality and hashing. If I want to write these kinds of functions using functors I guess I’d use first-class modules to pass a concrete Hashtbl implementation (generated by Hashtbl.Make) into that function. What exactly would that look like?

Furthermore, can I factor out any commonality between equivalent functions for different ordered collections such as List, Array, Stack, Queue and so on using functors?

Here’s an example from the iter library iter/Iter.ml at master · c-cube/iter · GitHub. I’m a fan of using iterators to abstract over different collection types. The stdlib has Seq now, which can serve the same purpose.

2 Likes

Funnily enough, I played with the problem of implementing classifiers recently. Here is what I came with. One of the attempts use first-class modules indeed. Not saying that’s the best possible interface; I let you judge. That might at least answer your question:

What exactly would that look like?

1 Like