Memory footprint of a weak set

I am experimenting with Weak.Make to increase sharing of strings. It works as expected that values in a weak set disappear when they are no longer referenced otherwise. But inspecting the memory of the weak set it does not seem to change.

The code below adds strings to a weak hash set, removes references to them and triggers a garbage collection. As expected the size of the sets increases and goes back to zero. But the memory reachable from it does not. Any insights?

pool memory = 37
pool size = 0
pool memory = 376285
pool size = 100000
pool memory = 376285
pool size = 0
module HashedString = struct
  type t = string

  let equal = String.equal

  let hash = Hashtbl.hash
end

module Pool = Weak.Make (HashedString)

let rec seq f = function
  | 0 -> ()
  | n ->
      f n;
      seq f (n - 1)

let pool = Pool.create 13

let add str =
  match Pool.find_opt pool str with
  | None ->
      Pool.add pool str;
      str
  | Some s -> s

let run _ =
  let x = ref [] in
  Printf.printf "pool memory = %d\n" (Obj.reachable_words (Obj.repr pool));
  Printf.printf "pool size = %d\n" (Pool.count pool);
  seq (fun n -> x := add (string_of_int n) :: !x) 100000;
  Printf.printf "pool memory = %d\n" (Obj.reachable_words (Obj.repr pool));
  Printf.printf "pool size = %d\n" (Pool.count pool);
  x := [];
  Gc.compact ();
  Printf.printf "pool memory = %d\n" (Obj.reachable_words (Obj.repr pool));
  Printf.printf "pool size = %d\n" (Pool.count pool)

My guess would be that Obj.reachable_words is returning the size of the intermediate structures allocated by the weak hash table to hold all those weak references (the weak references themselves are not ā€œreachableā€ by the GC). As the hash table does not shrink automatically when the references in it are removed (by the GC or manually), this number does not changeā€¦

Does it make sense? I havenā€™t double-checked, so not sure if I am missing something.

Cheers,
NicolƔs

1 Like

It seems a bit suspicious that the reachable words are exactly the same - as if the strings in the set are not considered reachable because they are behind a weak reference. So maybe the number for the large set is too small and indeed only intermediate data structures are reported. Itā€™s still a bit disappointing that the weak set cannot be rebalanced to reduce them.

Did you try with Pool.clear ?

Calling Pool.clear improves the situation but it is not a general solution because it removes all elements. In general you want to adjust to the current size (which in general is not empty) after garbage collection.

However, I need to try to insert a few elements again to give the implementation the chance to rebalance. Obviously the GC canā€™t trigger that. Addendum: adding new elements after the GC cycle does not reduce memory.

pool memory = 37
pool size = 0
pool memory = 376285
pool size = 100000

 x := [];
 Gc.compact ();
 Pool.clear pool;

pool memory = 36413
pool size = 0

One thing in this code that I see is that thereā€™s no attempt to remove an empty cell itself from the collection. Thus, this code appears to produce empty cells in the end.

Do you think there should be a finalizer for each cell that would be run and remove that cell from the collection? The hashes and hash sets have a significant advantage here, since in case of arrays weā€™d likely need to either keep a list of empty cells or to figure out a routine for moving values on each deletion.

This would be a bug in the standard library then:

Each value may magically disappear from the set when it is not used by the rest of the program any more