Efficiently checking if Map/Set is singleton?

Map/Set provide O(1) is_empty functions, but they don’t provide is_singleton functions. Concretely, I would like O(1) functions

Set.(is_singleton : 'val t -> 'val option)
Map.(is_singlelton : ('key,'val) t -> ('key * 'val) option)

Maybe the length functions are guaranteed to be O(1), and so I can achieve this simply by using is_length followed by to_list?

Depends on whether you’re using the Core Map or the stdlib Map. Core’s (Base’s) Map and Set return length in O(1), but the stdlib’s are O(n).

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Name         β”‚    Time/Run β”‚ Percentage β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ stdlib:1     β”‚      9.56ns β”‚      0.02% β”‚
β”‚ stdlib:10    β”‚     51.94ns β”‚      0.10% β”‚
β”‚ stdlib:100   β”‚    505.01ns β”‚      0.92% β”‚
β”‚ stdlib:1000  β”‚  4_899.77ns β”‚      8.97% β”‚
β”‚ stdlib:10000 β”‚ 54_595.93ns β”‚    100.00% β”‚
β”‚ core:1       β”‚      3.56ns β”‚            β”‚
β”‚ core:10      β”‚      3.56ns β”‚            β”‚
β”‚ core:100     β”‚      3.67ns β”‚            β”‚
β”‚ core:1000    β”‚      3.52ns β”‚            β”‚
β”‚ core:10000   β”‚      4.01ns β”‚            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

See this gist for the benchmark. (compile with dune/jbuilder or corebuild -pkg core_bench).

2 Likes

I usually use Base/Core, so that will work for me.
Seems like an embarrassing flaw in the standard library, though…

For the stdlib you could always do something ugly like this:

module S = Set.Make (String)

exception Done

let sole_element set =
  let cell = ref None in
  begin try
      S.iter (fun x ->
          match !cell with
          | None -> cell := Some x
          | Some _ -> cell := None; raise Done)
        set;
    with Done -> ()
  end;
  !cell
1 Like

Here is another way to test for a singleton set in the standard lib:

module S = Set.Make(String)

let is_singleton s =
  try S.min_elt s = S.max_elt s with Not_found -> false

This doesn’t generalize nicely to arbitrary cardinality tests.

One more using stdlib:

module String_set = struct
  include Set.Make (String)

  let is_singleton s =
    match choose_opt s with
    | None -> false
    | Some e -> equal s (singleton e)
end
(* O(log n), a bit less polymorphic than necessary as it depends on equality of keys.
    corrected: I thought it was O(1), gsg pointed out it is logarithmic. My bad! *)
let is_singleton s =
  match split (choose s) s with
  | exception Not_found -> false (* empty *)
  | (l, _, r) -> is_empty l && is_empty r

(* polymorphic (it depends only on the structure of the container, not on elements),
   but O(log n) as iter/fold has to traverse the left-most branch before starting to iterate *)
let is_singleton s =
  match fold (fun elt count -> if count > 0 then raise Exit else count + 1) s 0 with
  | 1 -> true
  | _ -> false
  | exception Exit -> false

Should work on Set and Map (slightly adapting the fold function, doesn’t depend on elements of the map).

Another variation, using exists/for_all functions, which dontt do in order traversal, thus are O(1) (in this case):

let is_singleton s =
  not (is_empty s) && 
  let count = ref 0 in for_all (fun _ -> incr count; !count = 1) s
1 Like

O(1) for a reasonable implementation

choose is O(log n), though? That’s why it can guarantee equal elements for equal sets.

O(log n) as iter/fold has to traverse the left-most branch

Nuts.

1 Like

Arrgghh, you are right. choose is indeed O(log n) (would work with sets with a unique structure, such as patricia :P).
It happens to be min_elt in current implementation.

Seems as if the exists approach is the least bad. I suppose that occasionally having to write this kind of ugly code is the downside of abstract data types with minimalist interfaces.

Checking equality can be expensive in general, so I would like to avoid that.