In lazily evaluated languages like Haskell, there is is an operational runtime difference between foldl and foldr even when you are applying an associative reduce operation. This is because the former is essentially left-recursive while the latter implicitly allows early termination if the tail does not have to be evaluated (thanks to weak head normal form). Similarly, in imperative-style languages with external iterators, it is trivial to exit a loop early with a return or break.
In OCaml, most data structures that support generic catamorphisms implement val fold_left : ('b -> 'a -> 'b) -> 'b -> 'a t -> 'b and val iter : ('a -> unit) -> 'a t -> unit). However, as far as I can tell, there is no “standard” way to do early termination. Even if some structure happens to implement fold_right (which appears to be somewhat unusual), this does not allow for early termination and instead effectively iterates in “reverse” (judging by common implementations in the wild).
Is there a clean, commonly used abstraction that I’ve missed that is appropriate for this purpose? You can use value-bearing exceptions for this purpose, but I’m not happy with using exceptions for normal control flow. The best I’ve come up with is a generic wrapper shaped like type 'a t = Continue of 'a | Done of 'a which acts as some sort of degenerate monad. However, I don’t think I’ve seen any such uses in the wild. You would also need to special case all early-termination fold operators to use this wrapper. How do people deal with this in practice outside of imperative loops?
(Note that you can also implement early termination by using a “manual” imperative-style loop implemented with a tail-recursive function on top of a Seq.t. This question is inspired by @dbuenzli’s comment here. The fact of the matter is that left-folds are not the most generic form of iteration, and there doesn’t seem to be an accepted way to do this generically.)
Ah, not sure how I missed that. (FWIW, I wouldn’t actually consider that “degenerate” because it has proper “left” and “right” types; this is actually cleaner because it allows you to signal different semantics in the case that more data is “expected”.)
In any case, this does not seem to be a “standard” idiom in that I wouldn’t expect data structures outside of base to implement it. I’m looking for a mechanism that is “widely” used and that I can expect consumers to use for free since they already have code written for that pattern. Is this something that has been considered for inclusion in the standard library? Right now, the only way to do early termination is to use exception-based control flow with a fold (whether through a wrapper like with_return or directly) or else to use Seq.t iterators. It’s true that the Seq method is fully general, but it’s also true that this comes with the cost of extra cons cell allocations that you might otherwise be able to avoid. I’m also curious how widely held @dbuenzli’s convictions are.
How interesting: I was unaware of with_return. In future, I’ll use it! grin
As a systems-jock, I would avoid using Seq-based iterators: the extra allocation, as well as just the extra knowledge overhead, would be enough to turn me away. After all, it isn’t very common to want to terminate early, and in those cases, something like with_return seems like a great idea. Indeed, I’d hope that in some future version of the OCaml compiler, with sufficient inlining it might infer that the exception was raised nearby to where it was caught, and hence might implement it efficiently with no allocation. A boy can dream. For similar reasons, I would avoid a version of fold that returned some sort of sum of “keep going” and “stop now with this result”.
One of the reasons I program in OCaml, is that it’s a language that lends itself to thinking imperatively when I want; so it doesn’t seem an imposition to use exceptions for situations like this.
This is an interesting perspective, and it’s true that this one of the nice aspects of OCaml. Do you have a rough idea of the present-day runtime cost of exceptions for control flow like this? Local exceptions do seem to be used more widely than I might expect for cases like these, but I don’t have a good understanding of how this translates under the hood.
A long, long time ago, Emmanuel Chailloux told me that OCaml exceptions were significantly cheaper than dot.Net exceptions (for whatever that’s worth). I also remember there was work in some language/compiler (might have been a industrial JVM) to infer this sort of “exception raised here is always caught there, never reraised, and we know what the intervening stack-frames will be” in order to eliminate the exception entirely and just transfer the params from the raised location to the catcher location.
TL;DW - they are relatively cheap. Creating an exception without a parameter won’t allocate whereas one with a parameter will, so parameter-less ones are a better choice for situations where allocation behavior matters. There’s more information along these lines in the talk.
Certainly opinions about exception usage vary widely and I am sympathetic to the desire to use them sparingly. Like @Chet_Murthy, I tend to see them as a “benign effect” (a la Harper) in situations like the one being discussed, alongside other imperative features like iteration and mutable references, but YMMV.
My opinion is that I do not have any issue with narrowly scoped uses of exceptions raised with raise_notrace, and they are very good for implementing this sort of thing in general ways. So personally I find that it is ok to expect every data structure that can be “traversed” to implement an iter function, at which point I will use the iter library. For early termination, there are the *_while operations such as:
val fold_while : ('a -> 'b -> 'a * [ `Stop | `Continue ]) -> 'a -> 'b t -> 'a
(** Folds over elements of the iterator, stopping early if the accumulator
returns [('a, `Stop)] *)
let fold_while f s seq =
let exception ExitFoldWhile in
let state = ref s in
let consume x =
let acc, cont = f !state x in
state := acc;
match cont with
| `Stop -> raise_notrace ExitFoldWhile
| `Continue -> ()
in
try
seq consume;
!state
with ExitFoldWhile -> !state
Note that the ExitFoldWhile exception is constant and so will not be allocated, and it is raised with raise_notrace so it will not clobber any existing exception backtrace and be fast. In my experience in the cases where f is small, it is often inlined, and the intermediate pair it produces is eliminated, and flambda compiles a lot of iter-using code down to loops. There is some small cost to pushing the trap handler (the implementation of try), but it is basically negligible if not in a hot loop and here it is outside the seq consume loop. Exceptions in OCaml, especially constant ones, are pretty fast.
(I’m not aware of the current compilers optimizing this use to avoid using the dynamic exception mechanism, but the analysis in the original PR to compile this use of local exceptions to staticcatch/staticraise does not look to be terribly complicated.)
This approach is general, only requiring each data structure to have an iter function, and it has good performance for both short and long sequences. There is the drawback that, being internal iterators, operations like zip can’t be supported. In my experience uses that need those are the very small minority, and that is when cooperation beyond an iter function is needed from data structure implementations, and that is when Stdlib.Seq is useful as a common interface that can be widely implemented. Seq is still relatively young though, so you won’t see everything implement it.
Yeah, I was originally envisioning something similar to this after reading the comments above about non-allocating exceptions. Specifically, storing the state in a ref and then catching a local niladic early-termination exception. At that point, though, the code becomes imperative enough in style and “unusual” enough in control flow that I would probably prefer the more explicit loop style in practice (assuming the structure of interest exposed a way to implement this).
I agree that good old fashioned loops have their place and sometimes are the simplest solution. But I would encourage people not to underappreciate how different code using a bunch of iter combinators feels. It is a declarative and very externally functional experience, compared to a deep nest of loops. The imperative implementation does not leak out of the combinator interface.
Also unless I am mistaken, what you mean is to write the loop outside of the data structure implementation, and then
will land back in the same general area as Stdlib.Seq since what you need to write the loop outside of the data structure implementation is equivalent to what you need to provide an external iterator.
I thought the with_return solution was … pretty sweet? E.g.
# with_return (fun r -> List.fold_left (fun acc x -> if x = 0 then r.return acc else acc+x) 0 [1;2;3;0;4;5;6]);;
- : int = 6
Seems like it conses, but only two blocks (the return object and on the exceptional-return path), and based on the conversation above, maybe one can remove the latter with some further hacking.
Yes, I am happy that we have both with_return and the likes of Iter.*_while. Depending on the code at hand I will use either. For code that is using iterators anyway, the _while operations fit very well. For other cases, or ad hoc non-tail recursive functions that I want to jump out of, with_return fits well.
(As a CPS guy what I really want is static one-shot continuations, and I’d even be happy if it looked like goto k x in the source, but that is unlikely to be a popular position )
For sure I’ve written my share of loops (usually small recursive functions) but it always felt … uncomfortable. As you say, combinators just feel better.
FWIW, my initial instinct would actually be to use Seq.t and its combinators (and possibly those from oseq) to make this as ergonomic as possible. I’ve just gotten the impression that people prefer to avoid this and I’m trying to understand why and how people work around that. I think that Seq was a good addition to the standard library even if there wasn’t wide agreement when it was merged.
Others may have other reasons, but as a systems jock who has programmed in C/C++ and at low levels in runtimes, my understanding is that Seq introduces an intermediate layer that costs in allocations. So I wouldn’t use it simply for that reason. Also, when iterating over some data-structure, unless I need to be able to abort early, Seq doesn’t add anything (that I know of): I’m pefectly OK with using different iteration combinators for different data-structures.
Honestly, before I knew about with_return, I could have been convinced to use Seq (“it’s nicer than having to write all these recursive/while-loops”) but know that I know about with_return, that’s gone away too.
Seq is nice because it’s easy to program against if you’re used to list recursion, but it’s a bit slower and heavier than other approaches to iteration. For me, that’s not a reason not to use it 90% of the time, but it’s good to be aware of if you’re trying to eek out every last bit of performance. Especially if you’re doing something where the body of the loop is cheap and N is large, Seq might cost you more than the computation you actually care about. However, compiling with -O2, especially with flambda does mitigate the cost a bit, but it doesn’t get rid of it entirely.
Again, from my perspective this is not a reason to avoid Seq altogether—it’s actually surprisingly fast considering what the semantics are, but it is a good reason to be aware of what it costs and what the alternatives are.
I know you’re not partial to exceptions for “normal” control flow, but it’s worth adding to the great explanations here about how cheap they are relative to other languages (and they’re extremely so), an emphasis on how also they are by convention used in OCaml this way you describe.
That is to say, the Stdlib directly supports this use-case at user-level by providing a standard exception, Exit, which is promised to never be raised from Stdlib functions. It also uses exceptions like Invalid_arg and Not_found in its APIs knowing that they’re cheap.