I am experimenting with Weak.Make to increase sharing of strings. It works as expected that values in a weak set disappear when they are no longer referenced otherwise. But inspecting the memory of the weak set it does not seem to change.
The code below adds strings to a weak hash set, removes references to them and triggers a garbage collection. As expected the size of the sets increases and goes back to zero. But the memory reachable from it does not. Any insights?
pool memory = 37
pool size = 0
pool memory = 376285
pool size = 100000
pool memory = 376285
pool size = 0
module HashedString = struct
type t = string
let equal = String.equal
let hash = Hashtbl.hash
end
module Pool = Weak.Make (HashedString)
let rec seq f = function
| 0 -> ()
| n ->
f n;
seq f (n - 1)
let pool = Pool.create 13
let add str =
match Pool.find_opt pool str with
| None ->
Pool.add pool str;
str
| Some s -> s
let run _ =
let x = ref [] in
Printf.printf "pool memory = %d\n" (Obj.reachable_words (Obj.repr pool));
Printf.printf "pool size = %d\n" (Pool.count pool);
seq (fun n -> x := add (string_of_int n) :: !x) 100000;
Printf.printf "pool memory = %d\n" (Obj.reachable_words (Obj.repr pool));
Printf.printf "pool size = %d\n" (Pool.count pool);
x := [];
Gc.compact ();
Printf.printf "pool memory = %d\n" (Obj.reachable_words (Obj.repr pool));
Printf.printf "pool size = %d\n" (Pool.count pool)
My guess would be that Obj.reachable_words is returning the size of the intermediate structures allocated by the weak hash table to hold all those weak references (the weak references themselves are not āreachableā by the GC). As the hash table does not shrink automatically when the references in it are removed (by the GC or manually), this number does not changeā¦
Does it make sense? I havenāt double-checked, so not sure if I am missing something.
It seems a bit suspicious that the reachable words are exactly the same - as if the strings in the set are not considered reachable because they are behind a weak reference. So maybe the number for the large set is too small and indeed only intermediate data structures are reported. Itās still a bit disappointing that the weak set cannot be rebalanced to reduce them.
Calling Pool.clear improves the situation but it is not a general solution because it removes all elements. In general you want to adjust to the current size (which in general is not empty) after garbage collection.
However, I need to try to insert a few elements again to give the implementation the chance to rebalance. Obviously the GC canāt trigger that. Addendum: adding new elements after the GC cycle does not reduce memory.
pool memory = 37
pool size = 0
pool memory = 376285
pool size = 100000
x := [];
Gc.compact ();
Pool.clear pool;
pool memory = 36413
pool size = 0
One thing in this code that I see is that thereās no attempt to remove an empty cell itself from the collection. Thus, this code appears to produce empty cells in the end.
Do you think there should be a finalizer for each cell that would be run and remove that cell from the collection? The hashes and hash sets have a significant advantage here, since in case of arrays weād likely need to either keep a list of empty cells or to figure out a routine for moving values on each deletion.