How to write map2 for Base.Sequence?

I would like to have a map2 function for Base.Sequence. I can write a sort of version like this:

open Base.Sequence
let map2 f s1 s2 = map ~f (zip s1 s2)

where zip creates a sequence of pairs from elements of s1 and s2, so that map2 has the signature

('a * 'b -> 'c) -> 'a S.t -> 'b S.t -> 'c S.t

However, it seems unnecessary to create pairs just to pull them apart, and in any event I feel that a more natural signature is

('a -> 'b -> 'c) -> 'a S.t -> 'b S.t -> 'c S.t

I have have not figured out a way to write the latter function. I can pull apart sequences using hd and tl, but I can’t figure out how to build sequences one element at a time. There’s nothing like a cons function afaics, and the Sequence constructor that is used to build sequences isn’t visible outside the source file–or at least, I can’t figure out how to use it in my code. I get Error: Unbound constructor Sequence no matter what I try. I think this has to do with the type definitions and signatures in the sequence.ml

(* 'a is an item in the sequence, 's is the state that will produce the remainder of the
   sequence *)
type +_ t =
  | Sequence : 's * ('s -> ('a,'s) Step.t) -> 'a t

type 'a sequence = 'a t

vs. sequence.mli files, although I’m not sure understand the .ml and .mli type code fully.

Is there any way to build sequences step by step (whether by accessing the Sequence constructor or not), or some other way to write a function like map2, without modifying sequence.ml and rebuilding Base? (Maybe I would have trouble writing the function anyway, but at this point, I can’t even get past the Unbound constructor error.)

(At one point I thought I could use Base.Seq.Step to implement map2, but now I don’t think that made sense; the Yield constructor doesn’t expect a function to return the rest of the sequence; it just wants code to compute part of the next element. This is what I tried:

# let rec map2 f xs1 xs2 =
  match xs1, xs2 with
  | (Done, _)   | (_, Done) -> Done
  | (Yield (x1, rest1), Yield (x2, rest2)) ->
      Yield ((f x1 x2), map2 f rest1 rest2)
  | (Skip _, _) | (_, Skip _) -> failwith "Skip-handling not implemented";;
Error: This expression has type 'a but an expression was expected of type ('b, 'a) S.Step.t
       The type variable 'a occurs inside ('b, 'a) S.Step.t

)

I just realized that I can get the signature I want like this:

let map2 f xs ys = map ~f:(fun (x, y) -> f x y) (zip xs ys)

That still incorporates the seemingly unnecessary pair-building using zip.

I think the following works, without adding any allocation of pairs, and without eagerly deconstructing the input lists, only consuming them as they’re needed.

open! Base
open! Stdio
open Expect_test_helpers_kernel

let map2 s1 s2 ~f =
  Sequence.unfold ~init:(s1,s2)
    ~f:(fun (s1,s2) ->
      match Sequence.next s1, Sequence.next s2 with
      | None, None -> None
      | None, _ | _, None -> failwith "unequal lists"
      | Some (x1,rest1), Some (x2,rest2) -> Some (f x1 x2, (rest1,rest2)))


let%expect_test "test laziness" =
  let s = map2 (Sequence.range 0 100) (Sequence.range 0 100)
            ~f:(fun x y -> printf ".%!"; x + y)
  in
  [%expect {| |}];
  let s' = Sequence.take s 10 |> Sequence.to_list  in
  [%expect {| .......... |}];
  print_s [%sexp (s' : int list)];
  [%expect {|
    (0 2 4 6 8 10 12 14 16 18) |}];

Note that Sequence.next itself allocates, but I suspect that these should be inlined away in practice. That said, it’s worth checking.

1 Like

Though it looks like, even under Flambda, all of the options allocate quite a lot, though the one I wrote is a bit better:

let%expect_test "test laziness and allocation" =
  let s = map2 (Sequence.range 0 100) (Sequence.range 0 100) ~f:(+) in
  let s' =
    Sequence.zip (Sequence.range 0 100) (Sequence.range 0 100)
    |> Sequence.map ~f:(fun (x,y) -> x + y)
  in
  let _ =
    Expect_test_helpers_kernel.require_no_allocation [%here]
      (fun () -> Sequence.take s' 10 |> Sequence.to_list)
  in
  [%expect {|
    (* CR require-failed: yminsky/src/z.ml:21:53.
       Do not 'X' this CR; instead make the required property true,
       which will make the CR disappear.  For more information, see
       [Expect_test_helpers_kernel.require]. *)
    ("allocation exceeded limit"
      (allocation_limit (Minor_words 0))
      (minor_words_allocated 565)) |}];
  let s' =
    Expect_test_helpers_kernel.require_no_allocation [%here]
      (fun () -> Sequence.take s 10 |> Sequence.to_list)
  in
  [%expect {|
    (* CR require-failed: yminsky/src/z.ml:33:53.
       Do not 'X' this CR; instead make the required property true,
       which will make the CR disappear.  For more information, see
       [Expect_test_helpers_kernel.require]. *)
    ("allocation exceeded limit"
      (allocation_limit (Minor_words 0))
      (minor_words_allocated 435)) |}];
  print_s [%sexp (s' : int list)];
  [%expect {|
    (0 2 4 6 8 10 12 14 16 18) |}];
1 Like

Thanks @Yaron_Minsky for the definition and careful testing!

I had wondered if there was a way to do it with unfold, but I couldn’t see how. I have experimented with unfold as well as unfold_step, but I didn’t see that a combination of next and Some like that could work. I will have to study the example and the docs a bit more. I hadn’t used expect_test, either. That looks extremely useful and convenient.

(Sorry to ask, but can someone tell me what open! does? I’ve been looking for the answer, but it’s difficult to search on, and it’s an uncommon bit of syntax.)

open! is documented in the infamous “Language extensions” section of the manual: http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec250 . It makes explicit the fact that opening the module can shadow identifiers already present in the context.

2 Likes

Undocumented but equally important: open! Will silence the unused module warning. This is useful for libraries like Base which effectively set your environment, and so you want them open even the current state of the code happens not to use it.

1 Like

It would be good for this to be documented. Someone should file a Mantis ticket at the very least…

In fact there were no consensus on this feature the last time that this subject was discussed in https://github.com/ocaml/ocaml/pull/1110 . If I remember correctly (@gasche may be able to confirm) , this is also the main reason why open! is still in the language extension chapter.

There’s some unfortunate tension about this. The initial use of open! was only supposed to be for the documentation use. But it’s now been used very heavily for the the use I described above. One set of people is not eager to update the documentation to sanction the behavior they consider a bug. Another set of people (myself included) are not eager to have this very common use case be made impossible or ungainly.

So, we sit in a somewhat awkward spot.

What does “set your environment” mean? No need for full detail if it’s complicated. e.g. define operators?

Base shadows many modules and libraries in the stdlib. I want to make sure to have it open, even if I don’t happen to be using it in the given piece of code. Async is similar, in that it shadows blocking operations from Core with deferred-returning versions. Again, even if I don’t happen to be using Async presently, I think of it as defining the environment in which this code is supposed to operate, and so I want it open as a way of stating something about the policy of what modules this code should use.

1 Like