Sequences: two schools of thought?

When reading the book “More OCaml” a few months ago, I had a :exploding_head: moment when I got exposed to a type representing an infinite list. So simple! Beautiful!!

Fast forward a few months later, and I’m re-exploring the subject but this time learning a bit about the standard lib’s Seq module.

(I’ve made the following code a bit more verbose on purpose, for clarity)

(*
   dune exec --no-print-directory ./primes.exe
*)

module Via_seq = struct
  let rec make_primes (seq : int Seq.t) : int Seq.t =
    match seq () with
    | Seq.Nil -> fun () -> Seq.Nil
    | Seq.Cons (h, seq) ->
      let next_primes = Seq.filter (fun n -> n mod h <> 0) seq in
      fun () -> Cons (h, make_primes next_primes)
  ;;

  let primes : int Seq.t = make_primes @@ Seq.ints 2

  let main () =
    print_string
    @@ String.concat " "
    @@ List.map string_of_int
    @@ List.of_seq
    @@ Seq.take 20 primes
  ;;
end

module Via_own = struct
  (** A lazy list represents an infinite list.
      It has no tail, but a "tail function".
      It has no `Nil` constructor because the list has no end. *)
  type 'a lazylist = Cons of 'a * (unit -> 'a lazylist)

  (** Builds a lazylist from [n], always increasing by 1 *)
  let rec lseq n = Cons (n, fun () -> lseq (n + 1))

  let rec ltake n (Cons (h, tf)) =
    if n > 0 then
      h :: ltake (n - 1) (tf ())
    else
      []
  ;;

  let rec lfilter test (Cons (h, tf)) : 'a lazylist =
    if test h then
      Cons (h, fun () -> lfilter test (tf ()))
    else
      lfilter test (tf ())
  ;;

  let rec make_primes (Cons (h, tf)) =
    let next_primes = lfilter (fun n -> n mod h <> 0) (tf ()) in
    Cons (h, fun () -> make_primes next_primes)
  ;;

  let primes : int lazylist = make_primes (lseq 2)

  let main () =
    print_string @@ String.concat " " @@ List.map string_of_int @@ ltake 20 primes
  ;;
end

let () =
  ()
  ; print_endline "Primes (via Seq module)"
  ; Via_seq.main ()
  ; print_newline ()
  ; print_newline ()
  ; print_endline "Primes (via own lazylist type)"
  ; Via_own.main ()
  ; print_newline ()
;;

Even after playing with it for a while, the Seq module doesn’t feel as nice in comparaison, more complicated.

One thing that bugs me is introducing Nil which I feel kinda breaks the concept a bit (the list may or may not be infinite). As an example, if I want to represent a sequence of prime numbers, then introducing Nil conceptually doesn’t make much sens to me.

I looked around at other implementations and introducing Nil seems to be not so uncommon. However, I fail to understand what it buys me. Taking my generating primes example, if I ever wanted to stop producing primes, I could easily produce Some numbers up to a certain point (None otherwise) to signal to a consumer that it can stop consuming this “infinite” list (there won’t be other values of interest from this point on).

I found a reference to this simpler way of thinking about sequences here: 8.4. Sequences — OCaml Programming: Correct + Efficient + Beautiful

The “More OCaml” book is listed as a reference in the summary section.


So my current understanding is that there seems to be two school of thoughts:

  1. represent infinite sequences
  2. representing sequences that may or may not be infinite

In both cases, the value of the next item is produced as-needed, by evaluating a thunk.

I’d be interested to learn more. When and why one would want to use one representation vs the other?

And if there are other known/useful libraries on the matter.

3 Likes

Shouldn’t that be: because the list has no beginning. Nil is the base case?

with link to my favorite OCaml book. But I call foul: I’ve never seen a definition of “sequence” as “infinite list” anywhere but there. I think it’s a major flaw. The usual term is “stream”.

1 Like

I think you summarized the issue pretty well yourself:

If you need to represent sequences that may be finite, then having a Nil case is useful. If your sequences are known to be infinite, not having it is more appropriate.

Cheers,
Nicolas

1 Like

There are 3 cases:

  • Finite. Nil required
  • Infinite. Nil forbidden
  • Either. Nil???

I would say

  • finite: Nil required
  • infinite: Nil not required (the type could declare it but it won’t be used)
  • either: Nil required
2 Likes

Are OCaml datatypes coinductive, or in-between inductive and coinductive?
What is coinduction? - Computer Science Stack Exchange

Is there an aleph_1 sized list?

How bout this:

  • Finite: defined by induction, base case (Nil) required.
  • Infinite: described by coinduction. No base case and no constructors (Nil forbidden - if you define it, then your definition is not coinductive.)
  • Both: defined as the coproduct of the other two.

Yes! :wink: I’ve been thinking about this for many months (still trying to master coinduction/coalgebra/coetc). Some are clear inductive, e.g. enumerated (variant) types. But others, I’m not sure. Are polymorphic variants defined by induction or coinduction? I’m leaning toward the latter but I’m not sure.

1 Like

In my view it’s useful to think about lazy sequences in terms of what they give you in the ‘real world’. Lazy sequences represent an iteration design pattern where the producer and the consumer are decoupled.

For example, you could have a producer that reads rows from a database let’s say 100 at a time because a database query is relatively cheap. Your producer gives you a sequence that is behind the scenes in batches of 100.

Your consumer could be making HTTP calls with the items from the DB, and the calls are more expensive so it could make calls in batches of 10. So you could write a lazy sequence that reads 10 items at a time while still processing each item individually.

Then when you put these together you have an automatic backpressure mechanism where:

  1. You initially read 100 rows from the DB
  2. Then you send 10 HTTP requests
  3. Then you send 10 more

  1. Then you send 10 more
  2. Then you read 100 more rows from the DB

Most of this iteration behaviour is captured inside the lazy sequences by design. At the call site you would just use Seq.iter and think in terms of doing ‘one thing at a time’.

1 Like

Sure, but there’s a big difference between lazy evaluation and infinitary types. “Lazy sequence (stream?)'” conflates two distinct notions.

Yes they do, which is why they include the Nil (end of stream) case which one might think is not needed if a stream could never end. So Seq has a more general design than ‘only infinite stream’.

“lazy” is an evaluation strategy. It has nothing to do with types.

Nil (end of stream) - what is that supposed to mean? By definition streams do not have an end.

I see Seq as the kind of sequence suited for iterating over containers and thus suited for what’s most useful in the standard library. On one hand it could be extended with more cases, esp. if streaming from resources are involved:

  • An error-case may be needed if expanding the sequence may fail.
  • A yield-case may be needed to deal with concurrency (at least for OCaml 4).

But adding error and yield case when they are not needed, means the consumer will need to handle these cases or risk runtime errors. (It would be possible to use phantom types to eliminate these cases, though this is a bit more type engineering than we’re use to from the standard library, and may still require -> . cases when pattern matching.)

On the other hand, the Nil is redundant for infinite sequences. While it is possible to use such a sequence type for finite sequences by using an option type for the values, I think this is inelegant, since it does not guarantee that the sequence does not restart after an arbitrary number of None. The same quirkiness applies if we attempt to use a result as value type to represent errors.

I therefore think it is better to specialize the sequence for the purpose, which in this case I think are finite containers, and use alternative definitions where needed.

(Added: I mention containers though it’s of course very useful for pure computations, as well, finite or not.)

It is definitively possible to reflect evaluation strategy in the type system (at least if the ambient evaluation strategy is not strong reduction?); for instance with thunks or CPS-transforms. And symmetrically any co-inductive type needs to include evaluation steps in its representation to be able to represent diverging computation without diverging.

But since OCaml allows diverging terms, the difference between inductive types and co-inductive type is not as primordial as in a non-Turing complete language like Rocq (formerly Coq).

3 Likes

foo Seq.t is a sequence of values of type foo that is produced on demand, ie as you traverse it. It doesn’t claim to be specialized for infinite streams. Its main goal is to provide a common interface for “sequence of elements” as a common vocabulary between data structures and algorithms (and also to represent sequences that might have trouble fitting all in memory at once).

4 Likes

What is Rock?

additional characters to satisfy posting constraints

Rocq is the new name for the Coq proof assistant.

You can use Seq without Nil if you change your method of producing primes. This is from Primality test - Wikipedia :

 # let is_prime n =
  if n <= 1 then
    false
  else
    (if n = 2 || n = 3 then
       true
     else
       (if n mod 2 = 0 || n mod 3 = 0 then
          false
        else
          let i = ref 5 in
          while !i * !i <= n && n mod !i <> 0 && n mod (!i + 2) <> 0 do
            i := !i + 6
          done;
          (!i * !i > n)
       )
    );;
val is_prime : int -> bool = <fun>

# let rec lazy_primes_from start () =
  let n = ref start in
  while not (is_prime !n) do
    n := !n + 1
  done;
  Seq.Cons (!n, lazy_primes_from (!n + 1));;
val lazy_primes_from : int -> unit -> int Seq.node = <fun>

# lazy_primes_from 15 |> Seq.take 10 |> List.of_seq;;
- : int list = [17; 19; 23; 29; 31; 37; 41; 43; 47; 53]

I reckon that’s probably true for PL design. Is it true for “pure” type theory? My (naive) impression is that eval strategy is not relevant for pure type theory.