Physical equality safe to use for hash-consed variant type?

I have a hash-consed tree-like data structure as follows:

type t =
  | True
  | False
  | Branch of { var : Var.t; hi: t; lo: t; id : int }

let id (t : t) : int =
  match t with
  | False -> -2
  | True -> -1
  | Branch { id } -> id

module Triple = struct
  type t = Var.t * int * int
    [@@deriving hash, compare, sexp]

type manager = {
  mutable next_id : int;
  branch_cache : (Triple.t, t) Hashtbl.t;

let manager () = {
  next_id = 0;
  branch_cache = Hashtbl.create (module Triple);

let branch (mgr : manager) (var : Var.t) (hi : t) (lo : t) : t =
  let triple = (var, id hi, id lo) in
  Hashtbl.find_or_add mgr.branch_cache triple ~default:(fun () ->
    let id = mgr.next_id in
    mgr.next_id <- id + 1;
    Branch { var; hi; lo; id; }

let ctrue = True
let cfalse = False

The module signature ensures that two trees are equal if and only if they have the same id (and were constructed using the same manager):

type t = private
  | True
  | False
  | Branch of {
    var: Var.t;   (** the variable on which to branch *)
    hi: t;        (** subdiagram for case [var = true] *)
    lo: t;        (** subdiagramm for case [var = false] *)
    id: int;      (** unique identifier for this diagram *)
val cfalse : t
val ctrue : t
type manager
val manager : unit -> manager
val branch : manager -> Var.t -> t -> t -> t

Now to my question: Is it safe to use physical equality on these trees?
My current implementation is more conservative and looks as follows,

let equal (t1 : t) (t2 : t) : bool =
  match t1, t2 with
  | True, True | False, False ->
  | Branch { id=id1 }, Branch { id=id2 } ->
    id1 = id2
  | _ ->

but I was wondering if it is safe to optimize this further to

let equal (t1 : t) (t2 : t) : bool = (t1 == t2)


Does equal require that the trees were constructed by the same manager? In the first implementation (pattern-matching based), I think trees constructed by different managers could be considered equal. The second implementation would not consider such trees equal.

Good point.
I am fine with the behavior of equal being arbitrary if two trees were constructed using different managers.

That optimization looks correct to me in that case. I also tried it out using quickcheck, so I’d bet it’s right.

I expect that it will work in the common case, but I’m worried that it is not guaranteed to work (depending on implementation details of the compiler). In particular, here is what the documentation says about (==):

e1 == e2 tests for physical equality of e1 and e2 . On mutable types such as references, arrays, byte sequences, records with mutable fields and objects with mutable instance variables, e1 == e2 is true if and only if physical modification of e1 also affects e2 . On non-mutable types, the behavior of ( == ) is implementation-dependent; however, it is guaranteed that e1 == e2 implies compare e1 e2 = 0 . Left-associative operator at precedence level 4/11.

This does not explicitly talk about variant types. Do they always fall in the immutable category? That would imply the optimization is not kosher (although it will likely work in practice…)

Variant types are immutable unless you use inline records, I believe. I suppose technically this might not be kosher, in which case you can still do something like:

let equal a b = phys_equal a b || structurally_equal a b
1 Like

In practice, and given your implementation, yes, but not really needed. In theory it is not safe.

The language specification is reserving a bit of freedom for the compiler optimizations. Since your tree is a non-mutable, an expression that denotes your tree is considered non-generative, therefore it could be a subject to various optimizations.

It won’t happen in practice, though. And many libraries are relying on this, e.g., Frama-C.
You may still try to be on a safer side, and use an efficient but still safe implementation like:

let equal t1 t2 = t1 == t2 || structurally_equal t1 t2

It will benefit from the hash consing, and will have a safe fall back for cases when hash consing has failed. If it never fails, it will never call stucturally_equal function for equal data. Also, probably the following implementation will make equality comparison faster,

let structurally_equal x y = Int.equal (id x) (id y) [@@inline]