Looking through all of the FP books I own (on OCaml, SML, Haskell, various lisps), some books at my library, some on the web, whatever blog posts and such are turned up by web searches, StackOverflow, this forum, plus the ocamldoc of some unfold functions in Batteries and Core (the functions aren’t always named “unfold”, but the docs or other people tell me that’s what they are), …
I have been unable to find a good introduction to the unfold concept. I have seen and have been told various versions of, essentially, "it’s the opposite of fold". I have even managed to understand one unfold function well enough to improve the ocamldoc for it. But I still don’t feel that I really understand the general concept. I don’t grok it.
What’s a good source for this concept? I need to rub my nose in a page or two or three on the topic, which are not common, or I think I would have found some of them by now. Can’t just be something in some special FP air that I have never breathed. People learn it somewhere.
Or if you want to explain it here, that’s great, and much appreciated, but I think I need more than a sentence or two.
(If there’s something on the web, I’ll add a link somewhere on ocamlverse.)
I don’t know much about the theory side of the concept, but for me, unfold represents an operation where you generate a sequence based on repeatedly applying a transformation to a “seed” or “state”.
So, for example, the sequence of natural numbers can be generated by starting with the state 0 and repeatedly incrementing the state by 1, like so:
let natural_numbers =
Base.Sequence.unfold ~init:0 ~f:(fun state ->
Some ( state (* yield an element*)
, state + 1 (* next state *)
By repeatedly applying the given function to successive values of the sequence, you will construct each next element of the sequence in turn. This is in some sense the opposite of a fold, because a fold takes a sequence and reduces it down to a single value, whereas an unfold takes a single value and builds a sequence out of it.
The best explanation I’ve seen can be found in Conal Elliott’s talk titled “Folds and unfolds all around us”. It’s simply beautiful how he reveals the symmetry between folds and unfolds, showing examples and presenting their interpretation from the category theory point of view.
Unfolds are also known as anamorphisms and there is a good wikipedia article about them.
Unfold is indeed dual to fold, and while fold is a reification of the structural induction, fold, in its duality, induces a co-inductive definition of data. Where fold abstracts a loop that consumes data, unfold abstracts a loop that produces data, so this also, consumer/producer, processor/generator duality. And, in the more general notion, this is the recurision/co-recursion
Another examples, a dual of the option data type, is the future (aka promise, aka co-option) co-data type. Both of them implement 1+x algebra, i.e., they represent a value that might be absent. Another example, is list vs co-list aka stream.
val unfold : ('b -> ('a * 'b) option) -> 'b -> 'a t
(** [unfold f acc] calls [f acc] and:
- if [f acc = Some (x, acc')], yield [x], continue with [unfold f acc'].
- if [f acc = None], stops.
let f = function
| 10 -> None
| x -> Some (x, x+1) in
unfold f 0 |> to_list
Thanks for all of these replies. The piece cited by @rizo looks very useful. @ivg, I had seen the Wikepedia article–I didn’t mention that–but it seemed very general and somewhat abstract, and I was hoping not to have to work so hard! However, since you recommend it, I now consider its study well worth the trouble, and I will read it properly.
I actually know how to use unfold functions, in the case of simple sequential data structures, at least. In fact it’s Base.Sequence's unfold that was the most recent stimulus for my question.
However, I think unfold is weird. A more natural way to do the same thing, to my mind, is with a function that is something like seq in Batteries’ LazyList and Enum. That is, rather than passing an initial value of type 'a and a function with type ('s ‑> ('a * 's)) (or some variation using e.g. Option), you pass a function with type 'a -> 'a that simply gives the next item from the previous one. I learned about this idea from Clojure’s iterate, whose description is:
(iterate f x): Returns a lazy sequence of x, (f x), (f (f x)) etc.
The Batteriesseq functions add an additional function argument to indicate that the sequence should end, but that’s inessential for lazy sequences.
I know that unfold is more general, and that it can do things conveniently that Batteries.*.seq/iterate-like functions can’t, but the additional complexity in unfold functions is unnecessary for all of the use cases I have had. I had a difficult time figuring out how to use my first unfold function. It was very confusing at that point.
So I just want a deeper understanding of where the heck this idea came from, since from the point of view of seq-like functions, it’s a strange idea. I think the links above will probably help. Thanks to everyone.
Well, 'a -> 'a is a possible implementation of the unfold function. It is especially good if you’re not distinguishing between the internal state of your system and the observed state, for example, when your system is just an ascending sequence of number, or sequence of odd numbers, etc. In that case, in terms of the Core’s unfold, your function f would be something like fun x -> Some (x+1,x+1) because your internal and external states are the same, and this sort of duplication.
However, it will soon become obvious that there are systems that have a state that is different from what is observed, for example, the sine function. The state is time, but the observed value is the amplitude of sine. In fact, you can many different facets of the state, i.e., the phase, or the I/Q coordinates of the sine.
An alternative definition of Iterator (that’s our name for unfold) is the following.
module Iterator = struct
module type Base = sig
val value : t -> dom
module type Finite = sig
val next : t -> t option
module type Infinite = sig
val next : t -> t
So, in other words, in general case, we need to distinguish between the state (the iterator) and the value that it produces (the observable state). Without modules, it would be:
val unfold : 's -> ('s -> 's option) -> ('s -> 'a) -> 'a sequence
val unfold : 's -> ('s -> ('a * 's) option) -> 'a sequence
They are isomorphic, just core’s interface requires you to provide the value on every step as a constituent of a pair. (Personally, I like the former representation more). The reason why they use this representation (or a more general with Step, because this is how the sequence is represented underneath the hood:
type _ t = Sequence : 's * ('s -> ('a,'s) Step.t) -> 'a t
P.S. Although it would be easy to implement a function like
val from : ('a -> 'a) -> 'a -> 'a sequence
let from f init = Sequence.unfold ~init ~f:(fun x -> let x = f x in Some (x,x))
I think that Core developers don’t want to provide it, as it will provoke people to use an internal mutable state to generate sequences. So they are trying to provide an interface that is easy to use correctly and hard to use incorrectly.
Thanks @ivg–that’s a very nice exposition of what can be done with unfold that can’t be done with seq/iterate/from.
That’s a nice point. Using mutable state is what one would do in Clojure if you needed state in an iterate sequence, since Clojure has no unfold. I agree that’s undesirable.
On the other hand, I still think that unfold can be very confusing for someone who’s unfamiliar with it, and that’s undesirable. I don’t think unfold should be removed, but it’s not a bad thing for a module to be more usable, and generating a sequence iteratively without carrying state is a natural thing to want to do. This should be easy, imo. seq/from/iterate is a basic building block function. (Well, it’s not in OCaml, in practice, or there would be more instances of it. But it could be, though that would be for designs that are less common in OCaml at present, maybe.)
seq/from/unfold aren’t just good for generating sequences of simple values, of course. I routinely use this sort of function to generate successive timesteps of a simulation. For example, in Clojure I used iterate to generate successive states of agent-based simulation models. This created a very clean functional design for an ABM, if you are able to organize your per-timestep interactions between agents in the right way (not always easy). Right now I’m using BatLazyList.seq to generate successive sets of matrices in a population genetic simulation (with matrices, not collections of agents). I’m switching to Core.Sequence, though.
Or fun x -> Some (x, x+1) if you want to start from the initial value at the beginning.
Some of @ivg’s remarks above, and earlier slides in the Conal Elliot presentation that @rizo linked, suggest to me that “unfold” is a somewhat loosely defined term, so that anything that applies a function repeatedly to build a complex data structure from an initial value can be considered an unfold, at least in relation to a corresponding “fold” function that applies a function repeatedly to a complex structure in order to calculate a result.
If that is correct, then I was kind of asking the wrong question. I understood how various unfold functions worked, more or less (but I understand them better after these helpful comments), and I just didn’t understand that e.g. Clojure’s iterate or BatLazyList’s seq are unfold functions. The fact that some fold functions are called “fold” and others aren’t would be the result of tradition or arbitrary decisions.
On the other hand, on slide 15 of the Elliot presentation, there’s a semi-formal argument that an unfold with a state-carrying value in the applied function is what is precisely dual to the usual fold functions. This seems to imply that seq/iterate are not fold functions in the fullest sense.
That slide kind of answers my original question, but maybe the term “unfold” is also used in a looser sense.
(I don’t understand everything in the Haskell type language in the slides, but I can figure out enough to get the general idea.)