"Collected" hash table and memoization ? (Real World OCaml)

In Imperative Programming - Real World OCaml, in the first “memoization” example, there is the following sentence:

Note that memo_table is referred to by the function, and so won’t be collected until the function returned by memoize is itself collected.

What does “collected” mean, here ? The (standard ?) meaning I can think of is that from set theory (for instance in the axiom of collection), so this would mean that the pairs (key, value) of the hash table memo_table are “collected” to be considered as a whole (as a collection) only once the function returned by memoize is called ? But this does not make much sense…

This is probably the crux to understanding the example:

let memoize f =
  let memo_table = Hashtbl.create (module Int) in
  (fun x ->
     Hashtbl.find_or_add memo_table x ~default:(fun () -> f x));;
val memoize : (int -> 'a) -> int -> 'a = <fun>

I understand what it is intended to do, but I do not understand why it works: it looks like each time memoize is called, a new (and empty) memo_table will be created, defeating the purpose of the function ?

To be explicit, if I define let f x = x + 1;; and then let g = memoize f;;, then, each time I call g (e.g., g 1;; and later g 2;;), the machine will have to compute (memoize f) 1;; and later (memoize f) 2;;, so each time there will be a new execution of Hashtbl.create. Where is the flaw in my reasoning ? (This is probably a recurring beginner’s question, but I haven’t found an answer so far.)

I think in this context it means “garbage collected”, that is that the Hashtbl will stay in memory until the function that is returned is also garbage collected.

2 Likes

Thanks @Leonidas ! That completely makes sense, and I don’t know why I did not make the connection “collected” with “garbage collected”… There is even a chapter about it in that book, although it is sixteen chapters later :frowning:
Any idea for the second question ?

Collected means garbage-collected in this context. The underlying question question is for how long the memoized function will remember the association f x = y for a value x.

Concerning the memoization, the crux of your issue seem to be that in

let f x = 1 + x
let g = memoize f
let u = g 0, g 1, g 0

the memoize function is called once, and there is only one table created.
Due to side effects, g 0 is not equivalent to (memoize f) 0.

A simpler example of this issue of semantics in presence of side-effect would be

let x = Printf.printf "x\n"; 1
let y = x + x + x

With the eager call-by-value evaluation used by OCam, the string "x\n" is printed once.

1 Like

Oh sorry, I sort of overlooked that second part. You observe rightly that calling memoize will create a new hash table every single time. But the crux is, that memoize is only called once per function that should be memoized: as you see memoize doesn’t return a value but you pass in a function and it returns another (anonymous) function that wraps your function.

let f x = Int.succ x
let memo_f = memoize f
let fourty_three = memo_f 42

memo_f is not the original function f but rather the new function that was created inside memoize and it wraps your original f — it only calls it when it doesn’t know the result yet.

1 Like

Thanks @octachron, but I am not sure what is different in the memoize example and in the following example:

let mul x y = print_endline "A"; x * y;;
let mul2 = mul 2;;
mul2 1;;
mul2 1;;

This prints “A” the last two times, and does not print it when mul2 is defined. What’s different compared with your memoize example ?

Note that the same happens if I define

let mult x y = let () = print_endline "A" in x * y;;
let mult2 = mult 2;;
mult2 4 + mult2 7;;

Then “A” is printed twice after the last command, but not when mult2 is defined.

Thanks.

That’s because let mul x y = print_endline "A"; x * y is just syntactic sugar for let mul = fun x -> fun y -> print_endline "A"; x * y". That way, when you define mul2, after one step of beta reduction, it’s as if you have wrote let mul2 y = print_endline "A"; 2 * y; a function that will print at each call. If you want something similar to memoization, test with:

let mul = fun x -> print_endline "A"; fun y -> x * y

(* compare to your example *)
let mul = fun x -> fun y -> print_endline "A"; x  * y
1 Like