Why is Batteries.LazyList.map2 not lazy?

mars0i · November 10, 2017, 8:32pm

I don’t mean "Why didn’t the authors of Batteries make LazyList.map2 lazy, but rather “What makes its code non-lazy?”

The doc comment for LazyList.map2 in Batteries says:

Raises Different_list_size if the two lists have different lengths. Not tail-recursive, lazy. In particular, the exception is raised only after the shortest list has been entirely consumed.

I am interpreting the second sentence as “Not tail-recursive and not lazy” (as opposed to “Not tail-recursive, but lazy”).

One question is about the phrase “In particular”. How does the fact that one must consume one of the lists fully to get the exception make the function non-lazy? (Or non-tail-recursive?? No.)

Looking at the source, though, I’m not clear on what is and isn’t lazy. The definition is:

let map2 f l1 l2 =
  let rec aux l1 l2 =
    match (next l1, next l2) with
    | (Cons (h1, t1), Cons(h2, t2)) -> lazy (Cons (f h1 h2, aux t1 t2))
    | (Nil, Nil)                    -> nil
    | (Cons _, Nil) | (Nil, Cons _) -> raise (Different_list_size "LazyList.map2")
  in aux l1 l2

Compare this with LazyList.map, whose doc comment says it is a “Lazy map”:

let map f l =
  let rec aux rest =  match next rest with
    | Cons (x, (t : 'a t)) -> Cons (f x, lazy (aux t))
    | Nil                  -> Nil
  in lazy (aux l)

They differ in that in map2, lazy is applied to the Cons, while in map, lazy is inside the Cons, and applies to computation of the next node. So there is something lazy about map2, but it’s different from map’s laziness. I don’t understand the difference, though.

It seems as if map2 is even more lazy than map. In map, you immediately apply f to the head, and the rest of the list is lazily constructed. In map2, f isn’t even applied to the head immediately.

theindigamer · November 10, 2017, 10:50pm

It is lazy actually. “Not tail-recursive, lazy.” == (not tail-recursive) and (lazy).

Contrast this with the next function: “iter2 … Tail-recursive, eager.” == (tail-recursive) and (eager).

Of course, you don’t have to believe me, try looking at an example.

let x = LazyList.seq 0 ((+) 1) ((<=) 0);;
let y = LazyList.seq 0 (fun z -> z - 1) ((>=) 0);;
let plus_w_print x y = let () = print_string "Hi!" in x + y;;
let z = LazyList.map2 plus_w_print x y;;
(* doesn't print anything *)
Lazy.force z;;
(* Hi!- : int BatLazyList.node_t = Batteries.LazyList.Cons (0, <lazy>) *)

That’s why it also says:

In particular, the exception is raised only after the shortest list has been entirely consumed.

So long as you don’t consume any of the input lists, you won’t get an exception because of the laziness. If the evaluation were eager, you’d get an exception immediately when you called the function with lists of mismatched sizes.

In map, you immediately apply f to the head, and the rest of the list is lazily constructed.

Not quite. Look at the overall definition let map f l = let ... in lazy (aux l). The lazy keyword prevents immediate evaluation. You can check this:

utop # let id_w_print x = let () = print_int x in x;;
utop # let foo = LazyList.map id_w_print x;;
val foo : int LazyList.t = <lazy>
utop # Lazy.force foo;;
0- : int BatLazyList.node_t = Batteries.LazyList.Cons (0, <lazy>)

mars0i · November 10, 2017, 11:59pm

Oh–well that explains it! Just a scope ambiguity.

Thanks for all of the further explanation and demonstration. I hadn’t noticed that other lazy in the map definition, and now I understand the comment about the exception. Makes sense.

theindigamer · November 11, 2017, 1:40am

The point of my small example was to show how you can “test the hypothesis” yourself without
jumping into the source directly . It’s often much easier that way.

mars0i · November 11, 2017, 3:38am

@theindigamer, thanks. I knew the principle, but hadn’t taken the time to explore good ways to do it in OCaml with lazy lists, so your examples are definitely helpful. Adding print statements is not as trivial in OCaml as in many other languages; that’s a partly consequence of a rigorous type system.

theindigamer · November 11, 2017, 4:13am

Adding print statements is not as trivial in OCaml as in many other languages

I’m not really sure why you say that …? Granted it’s not Python-level trivial but it’s pretty close. Before any expression you can stick in a let () = print_any stdout foo in expr or (print_any stdout foo; expr) or something else (with additional type safety/correctness) along those lines …

You can also easily derive printing functions using ppx_deriving.

perry · November 11, 2017, 2:09pm

Indeed, one of the reasons I like working in OCaml is that it is so easy to just throw a print wherever I want to in order to debug. I know there are hacks to do this in Haskell without changing your types, but in theory in a pure functional language merely adding a print alters your types…

mars0i · November 11, 2017, 7:04pm

Well, I kind of think you proved my point. Anything seems simple if it’s familiar enough. How obvious do you think those expressions would be to a newbie?

OK, I am not a newbie, but far from an expert. I am well past the stage of the “What? There’s no special function that will print anything at all despite type restrictions?” (print_any: You meant that as a stand-in for print_string, print_int, etc., right? There is no print_any function afaik.) I learned printf syntax and routinely throw `Printf.printf into functions, so I don’t have to sequence a bunch of print statements with different types.

But I am still unsure about when print buffers will get flushed, which would be crucial for the lazy sequence test. (Sometimes it seems as if “%!” at the end of a format string did not flush the output as it’s supposed to, but I suspect that was due to a typo. Probably forgot the “%”.)

Hmm … suggesting learning an additional sub-language probably doesn’t help your case.

@perry, it’s not fair to argue that OCaml compares well with the most popular language that has very weird IO concepts! (I hear a Haskell programmer saying “What’s weird about IO in Haskell? Monads are simple!”)

My own comparison class is Lisps. But I don’t expect OCaml printing to be as straightforward, as, say, Clojure printing, just like I don’t expect it to have a map function that works on nearly every kind of sequence, as Clojure does. You get benefits in exchange for inconveniences either way.

gasche · November 11, 2017, 7:24pm

Note that it may be the case that map2 could be “lazier” than it currently is: it looks like it forces its input before we ask for any element of output – whereas map is written in a different style that ensures this does not happen.

This could be tested by creating a lazy list such as forcing the first cons cell prints something – but just defining it does not. Then you call map on it, you should observe no printing (only calling next on the result should print). If you call map2 on it, what do you see?

If it prints, feel free to send a pull request to make it lazier.

perry · November 11, 2017, 7:51pm

I think adding printfs to OCaml programs is as simple as one can reasonably make it while retaining the a typing discipline of this sort.

mars0i · November 11, 2017, 7:53pm

I won’t argue with that.

perry · November 11, 2017, 7:54pm

It’s also just plain simple. You can always replace foo with (printf ...; foo)

theindigamer · November 12, 2017, 1:28am

BatPervasives has the function print_any. I (incorrectly) assumed you’d seen it because I saw that you were using Batteries already. My bad.

How obvious do you think those expressions would be to a newbie?

I was in the same boat ~3 months back (zero OCaml knowledge) and a couple of key things I found useful were reading Real World OCaml and searching the docs for keywords. Unfortunately none of the docs have search bars, so I built them locally and used grep . That’s one complaint I can definitely get behind.

Jumping from language to language, my overall observation has been that if you feel that there is some operation that you need to do frequently and you can’t find a convenient way to do it, there is a decent chance that (a) you’re not using some feature because you don’t know of it or (b) the same thing can be done in a different manner but you haven’t had the aha! moment. In either case, reading stuff (books, docs, forums etc.) and asking questions is the way to go .

suggesting learning an additional sub-language probably doesn’t help your case.

You simply annotate your custom data type (say x) with [@@deriving show] (that’s literally it, there’s nothing else you need to know before using it) and you will magically get a show_x function available to use, without writing it out yourself manually. If all you’re using are primitive types, then you don’t even need to use this.

Maybe I’m missing something here but this doesn’t sound as radical as “learning an additional sub-language” .

I haven’t used Clojure I’m guessing that it has a generic show : 'a -> string function that can serialize anything. But you can’t have that with type safety [EDIT: as @perry pointed out, this restriction applies to OCaml’s type system out of the box, not with extensions that add type classes or similar functionality.]. print_any is exactly that with some caveats. If you want type safety, you can use the built-in printing functions or derive them for custom types…

perry · November 12, 2017, 2:25am

I figured out deriving show within a few days of starting to use OCaml. I still don’t really get anything that uses camlp4, but ppx stuff like deriving is not hard to figure out at all. deriving is every very well documented.

Well, given typeclasses, you can have a more consistent show or print experience across many types, and mostly you do indeed get something like that that in languages like Haskell.

That said, deriving show has been pretty good for me so far, though I am looking forward to extensions to the module system that I’ve heard mumbling about that would add a lot of what typeclasses are used for.

Oh, and I’ll also emphasize

I’m a total newcomer to the language.
I’m often easily confused by terrible documentation but I’ve been doing pretty well with OCaml so far.
I’d still like better docs. It would make life much easier. But all in all I’ve been happy enough that I jumped into OCaml with both feet.

theindigamer · November 12, 2017, 2:39am

Of course you’re right. I forgot to add the qualifier “without typeclasses”.

though I am looking forward to extensions to the module system that I’ve heard mumbling about that would add a lot of what typeclasses are used for

You might be interested in Modular implicits. I don’t know much about the subject but I’ve been reading that paper in my spare time.

perry · November 12, 2017, 2:44am

I believe Modular Implicits was one of the mechanisms I heard mentioned.

mars0i · November 12, 2017, 4:45am

@theindigamer, thanks very much for the info about print_any and show. Great! I had done a lot of searching on this issue on StackOverflow, and asked a question about it myself, and most of the answers just indicated that not having generic print functions was an (understandable and reasonable) limitation of OCaml (although I apparently forgot about a reference to BatPervasives.dump).

(No, given what you told me, I now don’t have to learn a new sub-language, but up until this moment, all that I knew about ppx was just that it involves one of several macro languages, and I didn’t see a need to delve into them. Anyway, I’m glad to know about [@@deriving show] now.)

(Still not completely sure when to expect print output to be flushed, but seems to work as intended in this case. Not a problem.)

mars0i · November 12, 2017, 5:35am

There is a difference, but I am not sure whether it’s exactly what you described @gasche. I think so. Looking at a value from map2 does cause the next value to be computed. map doesn’t do this.

# let pr = Printf.printf "%d\n%!";;
val pr : int -> unit = <fun>
# let s1 = LL.seq 0 (fun n -> let n' = n + 1 in pr n'; n') (fun _ -> true);;
val s1 : int LL.t = <lazy>
# let s2 = LL.seq 0 (fun n -> let n' = n + 1 in pr n'; n') (fun _ -> true);;
val s2 : int LL.t = <lazy>
# let s3 = LL.seq 0 (fun n -> let n' = n + 1 in pr n'; n') (fun _ -> true);;
val s3 : int LL.t = <lazy>
# let s1' = LL.map (fun n -> n)  s1;;
val s1' : int LL.t = <lazy>
# let s2s3 = LL.map2 (fun x y -> x, y) s2 s3;;
val s2s3 : (int * int) LL.t = <lazy>
# LL.hd s1';;
- : int = 0
# LL.hd s2s3;;
1
1
- : int * int = (0, 0)
# LL.hd (LL.tl s1');;
1
- : int = 1

Submitted an issue. Have to fix up my previous pull request before I add to it.

gasche · November 12, 2017, 7:40am

Note that the recommended process for pull-requests is to create a separate branch (on your repository) for each pull request, and that this lets you easily work on several pull requests in parallel.

mars0i · November 12, 2017, 2:54pm

Ah, ok, @gasche. I thought it was best to work in the branch to which I’d contribute. I never understood why github wouldn’t allow distinct unmerged PRs that way.

Topic		Replies	Views
Equality of LazyLists Learning lazy , batteries	18	4696	November 28, 2017
Batteries.LazyList.lazy_fold_right? Learning lazy , batteries	2	1392	November 12, 2017
More on LazyList vs Enum? Learning lazy , sequence	2	953	June 7, 2017
A new List.map that is both stack-safe and fast Community article , performance	34	12614	February 24, 2018
Fastest implementation of stack-safe List.map, featuring Containers, Batteries, and Base Community	4	1395	June 22, 2017

Why is Batteries.LazyList.map2 not lazy?

Related topics