Equality of LazyLists

Maybe I should submit an issue for this, but I thought I should get some discussion and feedback first.

Whether Batteries.LazyList lists are equal, or instead generate an exception, depends on whether the lists have been previously been evaluated:

# module LL = Batteries.LazyList;;
# let ns = LL.seq 0 ((+) 1) ((>) 3);;
# let ns' = LL.seq 0 ((+) 1) ((>) 3);;
# ns = ns';;
Exception: Invalid_argument "compare: functional value".
Raised by primitive operation at unknown location
Called from file "toplevel/toploop.ml", line 180, characters 17-56
# LL.(ns = ns');;  (* maybe there's a LazyList version of = ? *)
Exception: Invalid_argument "compare: functional value".
Raised by primitive operation at unknown location
Called from file "toplevel/toploop.ml", line 180, characters 17-56
# LL.last ns, LL.last ns';;  (* view last element, force realization *)
- : int * int = (2, 2)
# ns = ns';;
- : bool = true

For a list equality test to either raise an exception or return a legitimate result depending on how the list has been accessed previously seems to me arbitrary and bug-conducive. It seems as if what should happen when = is called is that all of the nodes should be evaluated, just as when last is called. (I suppose that another alternative would be to return a lazy result that needs to be forced, but I would not expect = to do that.)

I see in the source that LazyList is implemented using OCaml’s Lazy module. I don’t fully understand the Lazy module source, but I gather that it uses thunks to represent not-yet-computed elements. If a list node has not yet been evaluated, then, the equality comparison is between two functions. Since = is supposed to raise an exception when comparing functions, this explains why the first equality tests above raise the exception they do. However, it still seems bad that the test raises an exception or succeeds depending on context.

One might say, “Well what do you expect from a lazy list? It changes form depending on past accesses. Use at your own risk.” I don’t agree for this case. Equality is a very basic thing to expect to work (or not, if comparing functions). (Clojure lazy sequences never do this, btw, except maybe in some unusual cases involving lazy sequences embedded in lazy sequences.)

One might also say that lazy lists shouldn’t be compared for equality because they might be infinite. However, that would also be an argument against a last function. Yes, infinite lists can cause a problem, but I do feel that then it is the programmer’s responsibility to avoid the problems. Making = raise an exception won’t solve other problems. That is a risk that you take on if you use infinite lazy lists. (I do take on that risk, btw. I find lists of unspecified length convenient.)

It seems as if what should happen when = is called is that all of the nodes should be evaluated, just as when last is called.

Correction: What should happen, ideally, is that nodes should be evaluated until a pair of non-equal nodes is found, at which point false should be returned; otherwise continue until the end of one of the lists is found, and if all nodes so far are equal, check whether one of the lists is longer, in which case return false; otherwise return true. (I assume that this is what = does on regular, non-lazy lists.)

The documentation for (=) seems relevant; https://caml.inria.fr/pub/docs/manual-ocaml/libref/Pervasives.html

e1 = e2 tests for structural equality of e1 and e2. Mutable structures (e.g. references and arrays) are equal if and only if their current contents are structurally equal, even if the two mutable objects are not the same physical object.

As the = operator knows nothing about lazy things, I don’t see why it would force a lazy list during its operation. I would assume you really wouldn’t want that in the general case anyway. If the lazy list was infinite, you could end up with = never terminating.

Perhaps you should just define an operator on your own to do what you need in this instance.

Thanks @perry.

Well, I would not expect Pervasives to know about LazyList, but I would expect LazyList to know about =, and structural equality of lazy lists. I expect sensible behavior from a well-designed module, if possible. Maybe it’s not possible, but I think of LazyList as well designed, and worthy of improvement.

(I believe I adressed the point about infinite lists.)

Your criterion (two lazy lists are structurally equal iff all their elements are) seems easy to define:

let rec equal xs ys = match get xs, get ys with
  | None, None -> true
  | Some (x, xs), Some (y, ys) -> if x = y then equal xs ys else false
  | None, Some _
  | Some _, None -> false

Edit : you can define it shortly with for_all2

let equal xs ys = try LL.for_all2 (=) xs ys with LL.Different_list_size _ -> false

(* the exception is for the case when one is an initial segment of the other *)
LL.for_all2 (=) (LL.range 1 10) (LL.range 1 15);;
Exception: BatLazyList.Different_list_size "LazyList.for_all2".

equal (LL.range 1 10) (LL.range 1 15);;
- : bool = false
1 Like

How would it know about =? There is no operator overloading in OCaml. There is no mechanism by which operator overloading could be achieved, either, as OCaml lacks a mechanism like Haskell typeclasses (although apparently a feature called modular implicits that provides equivalent functionality is in the works). Regardless, there is no mechanism by which your expectation can be implemented.

(By the way, = in OCaml is not definable within the language itself, as things stand, it has to be built in.)

I will note, by the way, that if you try to compare two infinite data structures, it is difficult to define structural equality, period, because you would have to prove that two functions are the same, which cannot be done in the general case, full stop. Imagine if you have two infinite lazy lists generated by distinct functions, how will you show that the functions are equal, even in principle?

There are module-local operator definitions. e.g. for +:

# let m = Owl.Mat.sequential 2 2;;
val m : Owl.Mat.mat =
   C0 C1
R0  0  1
R1  2  3

# Owl.Mat.(m + m);;
- : (float, Bigarray.float64_elt) M.op_t2 =
   C0 C1
R0  0  2
R1  4  6

Regardless, there is no mechanism by which your expectation can be implemented.

That in itself wouldn’t make an imagined feature less valuable, of course.

(By the way, = in OCaml is not definable within the language itself, as things stand, it has to be built in.)

Thanks. This is what I was wondering about.

Well, apparently it is possible to redefine =, but you of course lose its polymorphism. e.g. using @kantian’s very nice definition of equal:

# let (=) = equal;;
val ( = ) : 'a LL.t -> 'a LL.t -> bool = <fun>
# (LL.range 1 10) = (LL.range 1 10);;
- : bool = true

But even if that was done as a module-local definition within LazyList, I accept that it would probably be a bad idea to define a local version of = given that it would be completely distinct from the normal = operator. The = operator is like air when you are breathing; you take it for granted, so it would be likely to surprise you when it behaved oddly, even if you learned about the new behavior in advance.

Defining another operator for LazyList would be a better idea, I suppose, despite the theoretical desirability imo of an extended =. Maybe ===? Or maybe it’s easier to just use equal.

I will note, by the way, that if you try to compare two infinite data structures, it is difficult to define structural equality, period, because you would have to prove that two functions are the same, which cannot be done in the general case, full stop. Imagine if you have two infinite lazy lists generated by distinct functions, how will you show that the functions are equal, even in principle?

No, I addressed that in the case of LazyList, and @kantian illustrated a good way to do it for an equality predicate. You just have to force evaluation of every node. We already have functions that do that for lazy lists: for_all2, iter, last, etc. (Again, the programmer should know the risk when s/he starts defining infinite structures.) A lazy structure might be implemented with function wrappers, but it’s not just a function, because the functions are replaced on demand.

I don’t see why you couldn’t do the same thing with equality of other structures, e.g. for lazy trees.

(I’ll add that I have the sense that lazy structures are not used a lot in the OCaml community at present, and there are several different lazy sequence modules, even just within Batteries, so I can’t advocate for incorporating lazy lists into the language in a way that would allow them to come under the umbrella of =, even if I personally would like that. My affection for lazy lists comes from my background with Clojure, where a lazy sequence type is a pervasive and useful aspect of the language. Clojure has different goals than OCaml, of course, but there’s no reason in principle that you couldn’t have a language with strict type inference, generally eager evaluation, and widespread use of lazy sequences. (I don’t want to use Haskell btw. I want eager evaluation most of the time and I prefer OCaml’s pragmatism.))

Module local, sure, but then you lose the original definition while you are using the new one. You can’t overload, and it only works by shadowing the original definition.

How do you force evaluation of every node in an infinite list?

I like this. Using equal avoids the original problem, which was that = throws an exception or returns a boolean depending on what’s happened before. (I consider that a very bad behavior. It would be better for = to throw an exception consistently.)

I realized, however, that this only works if the elements of the lazy list are themselves things to which = applies. The problem remains, if you define a lazy list of lazy lists or a lazy list of some other kind of sequence that exhibits the same behavior. You would have lazy lists such that equal throws an exception or returns a boolean depending on past behavior. kantian’s first definition has the same drawback.

It seems as if you might need a different equal predicate for each different embedded lazy structures of lazy structures.

This seems bad. Functions should not do this–they should not throw an exception, or not, depending on how you’ve (merely) examined a data structure in the past.

One can view the problem as stemming from the fact that a lazy structure is essentially one that’s modified imperatively when it’s used in certain ways. From that perspective it’s not surprising that a function applying to the structure behaves differently depending on past usage. But to me the intended functionality of a lazy structure is to behave the same, apart from timing, whether it’s been accessed in the past or not. The semantics should be the same, but with =, it’s not.

I do hope that modular implicits can solve the problem. I have no idea whether it will help with =, or only with other less special operators.

Hmm. Maybe a slightly kludgey solution would be to have an optional equality predicate parameter for equal?

let equal ?(eq=(=)) xs ys = try LL.for_all2 eq xs ys with LL.Different_list_size _ -> false;;

Interestingly, Enum includes an equal predicate, though LazyList doesn’t. I’ll have to investigate further.

For sure, it’s a better solution (that’s the one I thought first, but I did not write it for simplicity).

let equal ?(eq = (=)) xs ys =
  try LL.for_all2 eq xs ys
  with LL.Different_list_size _ -> false;;
val equal : ?eq:('a -> 'a -> bool) -> 'a LL.t -> 'a LL.t -> bool = <fun>

equal ~eq:equal;;
- : '_a LL.t LL.t -> '_a LL.t LL.t -> bool = <fun>

equal ~eq:(equal ~eq:equal);;
- : '_a LL.t LL.t LL.t -> '_a LL.t LL.t LL.t -> bool = <fun>

(* Important note : 
 * you have to eta-expand on one parameter to have full polymorphism
 * due to value restriction, another thing related to variance question ;-)
*)
fun xs -> equal ~eq:equal xs;;
- : 'a LL.t LL.t -> 'a LL.t LL.t -> bool = <fun>

fun xs -> equal ~eq:(fun xs -> equal ~eq:equal xs) xs;;
- : 'a LL.t LL.t LL.t -> 'a LL.t LL.t LL.t -> bool = <fun>
1 Like

According to their wiki “junior task” other modules miss an equality function. Maybe they are interested in having one for the LazyList module.

I saw you already send PR to their gihtub, maybe you could add another. In this case, it may me more useful to mimic the forall2 definition to not raise an exception :

let equal ?(eq = (=)) l1 l2 =
  let rec aux l1 l2 =
    match (next l1, next l2) with
    | (Cons (h1, t1), Cons(h2, t2)) -> eq h1 h2 && (aux t1 t2)
    | (Nil, Nil)                    -> true
    | (Cons _, Nil) | (Nil, Cons _) -> false
  in aux l1 l2

Another remark about the discussion you had with @gasche on map and map2: map2 forces the first cells when you call it but not map (which is what @gasche expected from their definitions).

let do_list n = LL.init n (fun n -> print_int n; n)
let s1,s2,s3 = do_list 5, do_list 5, do_list 5

LL.map identity s1;;
- : int LL.t = <lazy>

LL.map2 (fun x y -> x,y) s2 s3;;
00- : (int * int) LL.t = <lazy>

But this version, on the model of map, don’t:

let map2 f l1 l2  =
  let rec aux l1 l2 =
    match (next l1, next l2) with
    | (Cons (h1, t1), Cons(h2, t2)) -> (Cons (f h1 h2, lazy (aux t1 t2)))
    | (Nil, Nil)                    -> Nil
    | (Cons _, Nil) | (Nil, Cons _) -> raise (Different_list_size "LazyList.map2")
  in lazy (aux l1 l2);;

map2 (fun x y -> x,y) s2 s3;;
- : (int * int) LL.t = <lazy>

I don’t have a github account, but maybe you can PR this change too.

Thanks @kantian for those suggestions and background investigation! I’ll try to PR those when I get a chance. I was about to write a PR for LazyList.equal the other day when I realized that embedded lazy lists would create a problem for the simple definition without the optional argument, and then noticed equal in the Enum docs, and wanted to see what response I got in this thread.

In this case, it may me more useful to mimic the forall2 definition to not raise an exception

By default I usually incline toward making use of functions that are already there, i.e. for_all2 in this case. After all, maybe someone will come up with a more efficient implementation of for_all2, and then equal will benefit. (The recent discussion of alternative List.map implementations was striking.) Is wrapping for_all2 in try/with potentially less efficient? Or just aesthetically undesirable? I don’t know anything about the exception mechanism, and am not sure I want to try to figure it out right now.

Without compiler optimization it is, and I don’t know if the OCaml compiler can inline for_all2 in the definition of equal and not raise the exception. In other word, I don’t know if the compiler will generate the same code for the two definitions (the second one is more efficient). You should see with the library maintainer which one they prefer.

OK, good idea. Thanks.

Using forall2 and catching LL.Different_list_size is not quite correct because the equality function may be the one raising the exception. You would have to wrap each user-equality call in a handler for correctness, and that is too expensive. On the other hand, if we had made the (in retrospect more correct) choice of having forall2 just return false in uneven lists, then it would be just fine.

1 Like

This situation can append only if we compare nested lazy lists, and in such a case if this is the eq function that raises the exception, we can consider that the elements are distinct and so are the overall lists. In other words, two elements of the lists are equal iff eq returns true. So no matter if the raising exception comes from for_all2 or from the eq parameter that compares elements, in both cases the two lists are not equal.

Sure, but I don’t know if you can change this behavior for backward compatibility.

1 Like

Are you sure you need (and you want to define) an equal operator for lists which might be infinite?

Yes.

More verbosely:

As I explained in earlier posts, other functions in LazyList have the same risk, so I think it’s generally considered an acceptable risk, which the programmer has the responsibility to manage.

Further, other types in Batteries that can generate infinite sequences–Enum and Seq–already have an equal predicate. In both cases, the test will try to run forever if given infinite (or merely long) lists that are equal at least before the program fails, if it does (they’re tail-recursive, so maybe they’d just run forever). LazyList is anomalous in not having an equal predicate. The existing design principle seems to be that it’s OK to have test for equality that could be applied to lists that might be infinite if the programmer isn’t careful.

Not having an equal predicate means that the natural thing to do on finite lists is to test for equality with =, but this is horrible method, because it doesn’t behave consistently, as I explained in earlier posts. The only good way to test for equality without an equal function is to in effect write an equal function or something more specialized.

Lastly, I think the strategy that’s emerged from this discussion is that the right way to test for equality on a lazy list is to allow passing in an element-equality predicate. So if one really wanted to be able to test for equality of lazy lists in a way that could not run forever, i,e. that tested only up to some number of elements n, one option would be to write an element-equality predicate that counted how many elements have been seen and then exited (with true, or false, or an exception) when n was exceeded, and then pass that to equal.

Anyway, that’s my feeling about this.

After reflexion, I’m not sure the current behavior of for_all2 is a wrong design choice. Currently we have 3 possibilities:

  • for_all2 returns true
  • for_all2 returns false
  • for_all2 raises an exception

In the first case, the caller can rely on the assumption that the two lists have the same size and that the parameter p always returns true. In the second one, the caller can rely on the assumption that there exists two values x in xs and y in ys such that p x y = false. Finally in the third one, the caller can rely on the assumption that the two lists don’t have the same size but that p x y is true for all possible values. By changing the current behavior of for_all2, the caller will lose information.

Here an example of code using this behavior:

let is_initial_segment xs ys =
  try LL.for_all2 (=) xs ys
  with LL.Different_list_size _ -> true

is_initial_segment (LL.range 1 10) (LL.range 1 15);;
- : bool = true

(* even if we don't know which one is an initial of the other *)
is_initial_segment (LL.range 1 15) (LL.range 1 10);;
- : bool = true