Memory footprint of a weak set

I am experimenting with Weak.Make to increase sharing of strings. It works as expected that values in a weak set disappear when they are no longer referenced otherwise. But inspecting the memory of the weak set it does not seem to change.

The code below adds strings to a weak hash set, removes references to them and triggers a garbage collection. As expected the size of the sets increases and goes back to zero. But the memory reachable from it does not. Any insights?

pool memory = 37
pool size = 0
pool memory = 376285
pool size = 100000
pool memory = 376285
pool size = 0
module HashedString = struct
  type t = string

  let equal = String.equal

  let hash = Hashtbl.hash
end

module Pool = Weak.Make (HashedString)

let rec seq f = function
  | 0 -> ()
  | n ->
      f n;
      seq f (n - 1)

let pool = Pool.create 13

let add str =
  match Pool.find_opt pool str with
  | None ->
      Pool.add pool str;
      str
  | Some s -> s

let run _ =
  let x = ref [] in
  Printf.printf "pool memory = %d\n" (Obj.reachable_words (Obj.repr pool));
  Printf.printf "pool size = %d\n" (Pool.count pool);
  seq (fun n -> x := add (string_of_int n) :: !x) 100000;
  Printf.printf "pool memory = %d\n" (Obj.reachable_words (Obj.repr pool));
  Printf.printf "pool size = %d\n" (Pool.count pool);
  x := [];
  Gc.compact ();
  Printf.printf "pool memory = %d\n" (Obj.reachable_words (Obj.repr pool));
  Printf.printf "pool size = %d\n" (Pool.count pool)

My guess would be that Obj.reachable_words is returning the size of the intermediate structures allocated by the weak hash table to hold all those weak references (the weak references themselves are not “reachable” by the GC). As the hash table does not shrink automatically when the references in it are removed (by the GC or manually), this number does not change…

Does it make sense? I haven’t double-checked, so not sure if I am missing something.

Cheers,
Nicolás

1 Like

It seems a bit suspicious that the reachable words are exactly the same - as if the strings in the set are not considered reachable because they are behind a weak reference. So maybe the number for the large set is too small and indeed only intermediate data structures are reported. It’s still a bit disappointing that the weak set cannot be rebalanced to reduce them.

Did you try with Pool.clear ?

Calling Pool.clear improves the situation but it is not a general solution because it removes all elements. In general you want to adjust to the current size (which in general is not empty) after garbage collection.

However, I need to try to insert a few elements again to give the implementation the chance to rebalance. Obviously the GC can’t trigger that. Addendum: adding new elements after the GC cycle does not reduce memory.

pool memory = 37
pool size = 0
pool memory = 376285
pool size = 100000

 x := [];
 Gc.compact ();
 Pool.clear pool;

pool memory = 36413
pool size = 0