Memory leak when converting coroutine (with effects) to sequence

I have a recursive function that I want to “yield” a couple of times while it performs recursion. I would then like to convert that function into a sequence, where iterating over the sequence runs the recursive function and each yield results in an item in the sequence.

I came up with something like this:

type _ Effect.t += Yield_int : int -> unit Effect.t

let fibonacci (limit : int) =
  let rec aux (a : int) (b : int) : unit =
    if a <= limit then (
      Effect.perform (Yield_int a);
      aux b (a + b))
  in
  aux 1 1

let int_seq_of_coroutine (f : unit -> unit) : int Seq.t =
 fun () ->
  try
    f ();
    Seq.Nil
  with effect Yield_int i, k -> Seq.Cons (i, Effect.Deep.continue k)

let () =
  let rec loop () =
    let fib = int_seq_of_coroutine (fun () -> fibonacci 13) in
    (* Uncommenting the following line causes memory leak *)
    (* let fib = Seq.take 1 fib in *)
    Seq.iter ignore fib;
    loop ()
  in
  loop ()

Unsurprisingly, if I don’t run the coroutine until its end, this results in memory leakage. :slightly_frowning_face:

I assume this is intentional, as a coroutine might contain finalizers, and running those (by raising an exception) during garbage collection would lead to even more surprising effects.

My question is, how can I use generators in OCaml without risking memory leakage? I assume I need to explicitly discontinue the coroutine. But where and how do I do that typically?


Regarding a similar issue with Python, I found this post on stackoverflow: “What happens to resources held by abandoned generators?”

As explained in the documentation, you can execute the following code on your continuation, which prevents the memory leak:

exception Unwind
Gc.finalise (fun k -> try ignore (Effect.Deep.discontinue k Unwind) with _ -> ()) k

Unfortunately, the documentation does not explain why this is not the default behavior of the runtime. Presumably, the reason is because there is no simple way for resume to remove such a finalizer, so the finalizer would be called even if the continuation has been resumed, which would incur a large overhead.

2 Likes

I was able to trick Python into doing something nasty (which might demonstrate why OCaml doesn’t automatically do cleanup on garbage collection).

Consider the following Python code:

(Not sure if this works on every platform and every Python implementation. Probably not.)

#!/usr/bin/env python3

def generator():
    try:
        yield "Hello"
        yield "World"
    finally:
        print("A")

def run():
    g = generator()
    next(g)
    print("B")
    a = [g]
    a.append(a)

for _ in range(100000):
    run()

I trick CPython to not do an immediate cleanup (through reference counting) but to do cleanup during garbage collection by constructing a cycling reference. But running the finalization in the context of the garbage collection leads to execution at unwanted times, resulting in the following repeated error messages:

RuntimeError: reentrant call inside <_io.BufferedWriter name='<stdout>'>
Exception ignored in: <generator object generator at 0x26619432c20>
Traceback (most recent call last):
  File "/home/jbe/tmp/generator_cleanup/test.py", line 8, in generator
    print("A")

I guess depending on what I do in the finalizers of my coroutine in OCaml, I might run into similar trouble.

This makes me wonder if control-flow inversion and garbage collection is a good match at all.

I don’t think this has much to do with control-flow inversion. What you are describing is just a general issue with finalizers. Indeed, by definition, a finalizer is run during garbage collection, which means that it is run when there is a shortage of resources. In a sense, a finalizer is always run at the worst possible time, whatever your model of execution. Therefore, as with signal handlers and asynchronous exception handlers, finalizers should do as little work as possible. (Here it means that the continuation should just let the Unwind exception flow through it, doing nothing more than freeing a few system resources.)