OCaml Cycle detection and Garbage Collector

Hi,

I was wondering about something, when it comes to cycle detection – if I naively make a search data structure (hash table, tree) with pointers as keys to detect cycles, then presumably that will break with the garbage collector moving values around, as the addresses are not constant.

I have confirmed my suspicion with

let a = ref 1;;
let addressof x = Printf.printf "0x%08x (%d)\n" (Obj.magic x) (Obj.magic x);;
addressof a;;
addressof a;;
Gc.compact();;
addressof a;;
addressof a;;

(* 0x3fad252353dc (70012884964316)
   0x3fad252353dc (70012884964316)

   0x3fad24debbcc (70012880468940)
   0x3fad24debbcc (70012880468940) *)

But this is something that OCaml does, if you do

type 'a t = T of 'a * 'a t * 'a t;;
let rec x = T(1,T(2,x,x),x);;
(* val x : int t = T (1, T (2, <cycle>, <cycle>), <cycle>) *)

so this started me looking at the code, but I’m somewhat lost, I suspect the cycle detection is done in/around the code

which does use a hash table, from above

and here I get a little lost, does this not have a problem due to the GC moving objects around? I don’t see any GC locks or anything like that. Or have I misunderstood something?

Apologies about asking such an unusual question, I just wanted to clarify/clear this up, in case I fundamentally misunderstood something.

Yes, the GC moves heap-allocated objects around, and updates pointers to such objects accordingly. However, this does not cause problems for this kind of code because to detect cycles you only need that the equality of pointers be preserved after running the GC (which it is), and not that the value of the pointers themselves be preserved (which it is not).

Makes sense?

Cheers,
Nicolas

Thank you Nicolas, that makes sense, and I see how that would not break an unordered data structure such as a list, but for trees and hash tables the value of the pointers matters too, as you either hash it or assume it has an ordering which doesn’t change over time. Otherwise you end up returning false negatives for mem as the element you are looking for is there but you are looking for in the wrong place.

Note that there is no way in OCaml to extract the address of the pointer representing a value, so as to use it a key in a hash table or similar, which would indeed be unsound. (Obj.magic is not part of the OCaml language.)

Cheers,
Nicolas

Hashtbl.hash in ocaml/toplevel/genprintval.ml at trunk · ocaml/ocaml · GitHub is hashing using the data (with a recursion limit so it doesn’t loop on cycles) not the pointer value.

Ah, thank you both! That’s where I was confused, I didn’t look into more detail at the extra values of the hash function

I just assumed they were something to tune the hash (and they are, just not what I thought), but looking into more details, I see

where num is initially count so same comment applies, and sz is initially limit.

A little more background to frame the solution others have already provided. There are languages where the hash-function (or at least, the built-in hash-function) uses the address of the object being hashed (or addresses of things it points-to). There are two ways that can be handled in a GC:

(1) as in the original JDK, don’t move objects around. Actually, the original JDK broke objects into two parts: an immovable constant-sized header that didn’t move, and a body that was movable. So hashing could use the address of the header.

(2) when you want fully-movable copying GC, you put a bit in the header of the object that gets turned on when the object is hashed using the default method (that uses the address of the object); then when the object is moved, its representation gets changed and you cache the hash-code in the object (e.g. in a word at the end).

2 Likes