GC alive blocks clarification

I posted a while ago about a memory leak I was experiencing. The current hypothesis is that a field of a record is keeping the record alive, hence it would be incredibly useful to ensure I don’t have an incorrect model of what the GC is doing.

If I have some structure:

type foo = Boxed of int

type bar = {foo : foo; large_data: bytes list}

let v = 
  let bar = {foo = Boxed 1; large_data=...} in
  bar.foo

The question then is whether the large_data field would be garbage collected?

My mental model up to now is that bar would be GC’d and v would be a copied value of bar.foo, is that model correct, or would the small foo field keep the large_data field alive (hence causing the larger memory leak I’ve been experiencing?

If this isn’t occurring in this case, can it occur more generally?

In your example v will not keep bar alive.
However, if you’re using Flambda then bar will likely be statically allocated, so the large data will be kept alive indefinitely.

You can try something like:

let[@inline never] compute_v () =
  let bar = {foo = Boxed 1; large_data=...} in
  bar.foo

let v = compute_v ()

If the code for large_data is big enough you may not even need the [@inline never] annotation.

If your memory leak also occurs without Flambda, I’ll need a few more details before I can point you to a potential cause.

Thanks, I’m glad that my mental model isn’t entirely wrong :slight_smile:.

So the the current path for a singular message is as follows:

type msg = [`MsgType of {entries = entry list; term = int64}]

let () =
  let t = !t_ref in

  (* Parse from AsyncRpc server *)
  let buf = receive_from_socket () in
  let `MsgType msg = Bigstring.read_bin_prot buf ~pos:0 bin_msg in

  (* Apply to state machine *)
  let t' = 
    let term = Int64.max msg.term t.term
    let n = magic_number_maybe_0 in
    let entries= (List.drop msg.entries n) @ t.entries
    {term; entries}
  in

  t_ref := t

(There are many details removed from this, but these should be all the relevant lines)

The observed behavior is that, even with trying to explicitly copy the entries, some number of messages aren’t getting GC’d (I’ve attached finalisers to each message incrementing a counter).

(Unfortunately I’m seeing this behavior with and without flambda)

I’m afraid I can’t see anything wrong with this small piece of code. Do you have a pointer to the full code of your problem ? I might have a look directly there at some point.

1 Like

Yep I can point you at the relevant lines:

The specific issue is with the AppendEntries messages, where they don’t seem to be getting garbage collected.

Thanks so much for the offer!

Could you give me steps to reproduce the bug ? It feels like using gdb is going to be the fastest solution.

I’m just putting together a test bench for this which should hopefully show things much more clearly. I’ll post it here once I’ve got it reproducing the problem :slight_smile:

So managed to work out what the issue was.

There were several issues, first was that one node ended up building up a queue of messages in some tests. The other was that some of the updates were empty lists, which obviously then weren’t GC’d accounting for that small ~10% of updates.

This would then account for most of the issues seen. Thanks for offering to help me on this!