Why is Batteries.LazyList.map2 not lazy?

lazy
batteries

#1

I don’t mean "Why didn’t the authors of Batteries make LazyList.map2 lazy, but rather “What makes its code non-lazy?”

The doc comment for LazyList.map2 in Batteries says:

Raises Different_list_size if the two lists have different lengths. Not tail-recursive, lazy. In particular, the exception is raised only after the shortest list has been entirely consumed.

I am interpreting the second sentence as “Not tail-recursive and not lazy” (as opposed to “Not tail-recursive, but lazy”).

One question is about the phrase “In particular”. How does the fact that one must consume one of the lists fully to get the exception make the function non-lazy? (Or non-tail-recursive?? No.)

Looking at the source, though, I’m not clear on what is and isn’t lazy. The definition is:

let map2 f l1 l2 =
  let rec aux l1 l2 =
    match (next l1, next l2) with
    | (Cons (h1, t1), Cons(h2, t2)) -> lazy (Cons (f h1 h2, aux t1 t2))
    | (Nil, Nil)                    -> nil
    | (Cons _, Nil) | (Nil, Cons _) -> raise (Different_list_size "LazyList.map2")
  in aux l1 l2

Compare this with LazyList.map, whose doc comment says it is a “Lazy map”:

let map f l =
  let rec aux rest =  match next rest with
    | Cons (x, (t : 'a t)) -> Cons (f x, lazy (aux t))
    | Nil                  -> Nil
  in lazy (aux l)

They differ in that in map2, lazy is applied to the Cons, while in map, lazy is inside the Cons, and applies to computation of the next node. So there is something lazy about map2, but it’s different from map's laziness. I don’t understand the difference, though.

It seems as if map2 is even more lazy than map. In map, you immediately apply f to the head, and the rest of the list is lazily constructed. In map2, f isn’t even applied to the head immediately.


#2

It is lazy actually. “Not tail-recursive, lazy.” == (not tail-recursive) and (lazy).

Contrast this with the next function: “iter2 … Tail-recursive, eager.” == (tail-recursive) and (eager).

Of course, you don’t have to believe me, try looking at an example.

let x = LazyList.seq 0 ((+) 1) ((<=) 0);;
let y = LazyList.seq 0 (fun z -> z - 1) ((>=) 0);;
let plus_w_print x y = let () = print_string "Hi!" in x + y;;
let z = LazyList.map2 plus_w_print x y;;
(* doesn't print anything *)
Lazy.force z;;
(* Hi!- : int BatLazyList.node_t = Batteries.LazyList.Cons (0, <lazy>) *)

That’s why it also says:

In particular, the exception is raised only after the shortest list has been entirely consumed.

So long as you don’t consume any of the input lists, you won’t get an exception because of the laziness. If the evaluation were eager, you’d get an exception immediately when you called the function with lists of mismatched sizes.

In map, you immediately apply f to the head, and the rest of the list is lazily constructed.

Not quite. Look at the overall definition let map f l = let ... in lazy (aux l). The lazy keyword prevents immediate evaluation. You can check this:

utop # let id_w_print x = let () = print_int x in x;;
utop # let foo = LazyList.map id_w_print x;;
val foo : int LazyList.t = <lazy>
utop # Lazy.force foo;;
0- : int BatLazyList.node_t = Batteries.LazyList.Cons (0, <lazy>)

#3

Oh–well that explains it! Just a scope ambiguity.

Thanks for all of the further explanation and demonstration. I hadn’t noticed that other lazy in the map definition, and now I understand the comment about the exception. Makes sense.


#4

The point of my small example was to show how you can “test the hypothesis” yourself without
jumping into the source directly :slightly_smiling_face: . It’s often much easier that way.


#5

@theindigamer, thanks. I knew the principle, but hadn’t taken the time to explore good ways to do it in OCaml with lazy lists, so your examples are definitely helpful. Adding print statements is not as trivial in OCaml as in many other languages; that’s a partly consequence of a rigorous type system.


#6

Adding print statements is not as trivial in OCaml as in many other languages

I’m not really sure why you say that …? Granted it’s not Python-level trivial but it’s pretty close. Before any expression you can stick in a let () = print_any stdout foo in expr or (print_any stdout foo; expr) or something else (with additional type safety/correctness) along those lines …

You can also easily derive printing functions using ppx_deriving.


#7

Indeed, one of the reasons I like working in OCaml is that it is so easy to just throw a print wherever I want to in order to debug. I know there are hacks to do this in Haskell without changing your types, but in theory in a pure functional language merely adding a print alters your types…


#8

Well, I kind of think you proved my point. Anything seems simple if it’s familiar enough. How obvious do you think those expressions would be to a newbie?

OK, I am not a newbie, but far from an expert. I am well past the stage of the “What? There’s no special function that will print anything at all despite type restrictions?” (print_any: You meant that as a stand-in for print_string, print_int, etc., right? There is no print_any function afaik.) I learned printf syntax and routinely throw `Printf.printf into functions, so I don’t have to sequence a bunch of print statements with different types.

But I am still unsure about when print buffers will get flushed, which would be crucial for the lazy sequence test. (Sometimes it seems as if “%!” at the end of a format string did not flush the output as it’s supposed to, but I suspect that was due to a typo. Probably forgot the “%”.)

Hmm … suggesting learning an additional sub-language probably doesn’t help your case. :slight_smile:

@perry, it’s not fair to argue that OCaml compares well with the most popular language that has very weird IO concepts! :slight_smile: (I hear a Haskell programmer saying “What’s weird about IO in Haskell? Monads are simple!”)

My own comparison class is Lisps. But I don’t expect OCaml printing to be as straightforward, as, say, Clojure printing, just like I don’t expect it to have a map function that works on nearly every kind of sequence, as Clojure does. You get benefits in exchange for inconveniences either way.


#9

Note that it may be the case that map2 could be “lazier” than it currently is: it looks like it forces its input before we ask for any element of output – whereas map is written in a different style that ensures this does not happen.

This could be tested by creating a lazy list such as forcing the first cons cell prints something – but just defining it does not. Then you call map on it, you should observe no printing (only calling next on the result should print). If you call map2 on it, what do you see?

If it prints, feel free to send a pull request to make it lazier.


#10

I think adding printfs to OCaml programs is as simple as one can reasonably make it while retaining the a typing discipline of this sort.


#11

I won’t argue with that.


#12

It’s also just plain simple. You can always replace foo with (printf ...; foo)


#13

BatPervasives has the function print_any. I (incorrectly) assumed you’d seen it because I saw that you were using Batteries already. My bad.

How obvious do you think those expressions would be to a newbie?

I was in the same boat ~3 months back (zero OCaml knowledge) and a couple of key things I found useful were reading Real World OCaml and searching the docs for keywords. Unfortunately none of the docs have search bars, so I built them locally and used grep :frowning:. That’s one complaint I can definitely get behind.


Jumping from language to language, my overall observation has been that if you feel that there is some operation that you need to do frequently and you can’t find a convenient way to do it, there is a decent chance that (a) you’re not using some feature because you don’t know of it or (b) the same thing can be done in a different manner but you haven’t had the aha! moment. In either case, reading stuff (books, docs, forums etc.) and asking questions is the way to go :slightly_smiling_face:.

suggesting learning an additional sub-language probably doesn’t help your case.

You simply annotate your custom data type (say x) with [@@deriving show] (that’s literally it, there’s nothing else you need to know before using it) and you will magically get a show_x function available to use, without writing it out yourself manually. If all you’re using are primitive types, then you don’t even need to use this.

Maybe I’m missing something here but this doesn’t sound as radical as “learning an additional sub-language” :stuck_out_tongue_winking_eye:.

I haven’t used Clojure I’m guessing that it has a generic show : 'a -> string function that can serialize anything. But you can’t have that with type safety [EDIT: as @perry pointed out, this restriction applies to OCaml’s type system out of the box, not with extensions that add type classes or similar functionality.]. print_any is exactly that with some caveats. If you want type safety, you can use the built-in printing functions or derive them for custom types…


#14

I figured out deriving show within a few days of starting to use OCaml. I still don’t really get anything that uses camlp4, but ppx stuff like deriving is not hard to figure out at all. deriving is every very well documented.

Well, given typeclasses, you can have a more consistent show or print experience across many types, and mostly you do indeed get something like that that in languages like Haskell.

That said, deriving show has been pretty good for me so far, though I am looking forward to extensions to the module system that I’ve heard mumbling about that would add a lot of what typeclasses are used for.

Oh, and I’ll also emphasize

  1. I’m a total newcomer to the language.
  2. I’m often easily confused by terrible documentation but I’ve been doing pretty well with OCaml so far.
  3. I’d still like better docs. It would make life much easier. But all in all I’ve been happy enough that I jumped into OCaml with both feet.

#15

Of course you’re right. I forgot to add the qualifier “without typeclasses”.

though I am looking forward to extensions to the module system that I’ve heard mumbling about that would add a lot of what typeclasses are used for

You might be interested in Modular implicits. I don’t know much about the subject but I’ve been reading that paper in my spare time.


#16

I believe Modular Implicits was one of the mechanisms I heard mentioned.


#17

@theindigamer, thanks very much for the info about print_any and show. Great! I had done a lot of searching on this issue on StackOverflow, and asked a question about it myself, and most of the answers just indicated that not having generic print functions was an (understandable and reasonable) limitation of OCaml (although I apparently forgot about a reference to BatPervasives.dump).

(No, given what you told me, I now don’t have to learn a new sub-language, but up until this moment, all that I knew about ppx was just that it involves one of several macro languages, and I didn’t see a need to delve into them. Anyway, I’m glad to know about [@@deriving show] now.)

(Still not completely sure when to expect print output to be flushed, but seems to work as intended in this case. Not a problem.)


#18

There is a difference, but I am not sure whether it’s exactly what you described @gasche. I think so. Looking at a value from map2 does cause the next value to be computed. map doesn’t do this.

# let pr = Printf.printf "%d\n%!";;
val pr : int -> unit = <fun>
# let s1 = LL.seq 0 (fun n -> let n' = n + 1 in pr n'; n') (fun _ -> true);;
val s1 : int LL.t = <lazy>
# let s2 = LL.seq 0 (fun n -> let n' = n + 1 in pr n'; n') (fun _ -> true);;
val s2 : int LL.t = <lazy>
# let s3 = LL.seq 0 (fun n -> let n' = n + 1 in pr n'; n') (fun _ -> true);;
val s3 : int LL.t = <lazy>
# let s1' = LL.map (fun n -> n)  s1;;
val s1' : int LL.t = <lazy>
# let s2s3 = LL.map2 (fun x y -> x, y) s2 s3;;
val s2s3 : (int * int) LL.t = <lazy>
# LL.hd s1';;
- : int = 0
# LL.hd s2s3;;
1
1
- : int * int = (0, 0)
# LL.hd (LL.tl s1');;
1
- : int = 1

Submitted an issue. Have to fix up my previous pull request before I add to it.


#19

Note that the recommended process for pull-requests is to create a separate branch (on your repository) for each pull request, and that this lets you easily work on several pull requests in parallel.


#20

Ah, ok, @gasche. I thought it was best to work in the branch to which I’d contribute. I never understood why github wouldn’t allow distinct unmerged PRs that way.