A Y combinator you can actually use?

One thing that’s cool about the Y combinator is that it lets you “overload” recursion with extra stuff. So instead of doing normal recursion, like this:

# let rec fac n = if n = 1 then 1 else n * fac (n - 1);;
val fac : int -> int = <fun>
# fac 12;;
- : int = 479001600

You can do this, where you “factor the recursion out” into a separate function:

# let rec y f x = f (y f) x;;
val y : (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun>
# let fac_ fac n = if n = 1 then 1 else n * fac (n - 1);;
val fac_ : (int -> int) -> int -> int = <fun>
# y fac_ 12;;
- : int = 479001600

That lets you define higher-order functions you can pass these y-ified single-recursive-step functions into, which can augment each recursive call with, for example, a side effect:

# let trace name f_ f x = let r = f_ f x in Printf.printf "%d <- %s %d\n%!" r name x; r;;
val trace : string -> ('a -> int -> int) -> 'a -> int -> int = <fun>
# y (trace "fac called on" fac_) 12;;
1 <- fac called on 1
2 <- fac called on 2
6 <- fac called on 3
24 <- fac called on 4
120 <- fac called on 5
720 <- fac called on 6
5040 <- fac called on 7
40320 <- fac called on 8
362880 <- fac called on 9
3628800 <- fac called on 10
39916800 <- fac called on 11
479001600 <- fac called on 12
- : int = 479001600

But given that Y is forward-recursive, you blow up your stack if you use this technique with large inputs:

# y (trace "fac called on" fac_) 5000000;;
Stack overflow during evaluation (looping recursion?).

I was wondering, has anyone come up with a trick for implementing a version of Y that uses constant stack space? There isn’t an obvious way to naively make Curry’s definition tail recursive with the recursive call to y in the middle of those two f-s. Maybe with some weird imperative thing? I played around a little with a while loop, but I’m also not sure what (if anything!) the imperative analogue of Curry’s mind-bendy recursive definition would be.


Did you try original factorial for big input? It could give you stack overflow, despite the absence of fix point combinators.

P.S. Also, for my students, I refer to fixed point combinator for call-by-value as ‘Z- ombinator’.

Right, so the regular non-Y-ified factorial blows up your stack on large inputs, but it’s easy to make that one tail recursive:

# let tail_fac n = let rec loop acc = function 1 -> acc | n -> loop (n * acc) (n - 1) in loop 1 n;;
val tail_fac : int -> int = <fun>
# tail_fac 5000000;;
- : int = 0

Wrong answer due to the datatype, but no stack overflow.

the (generalized) transformation which converts a non-tail-recursive function to one which is tail-recursive is called continuation-passing. the accumulator solution is something you arrive at when you do a couple of steps after continuation-passing. I have not tried to transform fix (aka your y) using that style.

usually when I do this transformation, it’s to a function which has a base-case (for which you use the continuation parameters to “return”)

Hmm, that’s very interesting. Y’s lack of a base case has certainly been an issue for me!

Do you have a simple example of how you can use a CPS transform to tail-recursiv-ize a function? The way I’m in the habit of doing it (e.g. tail_fac above) doesn’t involve CPS. Maybe this is the kind of thing you had in mind?


As @Kakadu and @bufordrat already noticed, fac is not tail-recursive… but in y, the call to f is indeed in tail-call position. The following code does not overflow:

let fac_tail fac_tail accu n = if n = 1 then accu else fac_tail (n * accu) (n - 1)

let rec y f x = f (y f) x

let trace2 name f_ f accu x = let r = f_ f accu x in Printf.printf "%d <- %s %d %d\n%!" r name accu x; r

let _ = y (trace2 "fac called on" fac_tail) 1 500000

WHOA, SNAP!! So all you have to do is add the accumulating parameter to trace and then you can write tail-recursive fac the normal way!

Thanks, this is a way easier solution than I was anticipating.

OCaml 5 helps with this issue in a couple of ways.

First, it substantially increases the default stack size, so you’re unlikely to encounter the recursion limit (See: Bump default maximum stack size by xavierleroy · Pull Request #11238 · ocaml/ocaml · GitHub).

Second, it provides the facilities to write a fixed-point combinator that won’t overflow the stack:

let ($$) f x = Effect.Deep.try_with f x {effc=fun _ -> None}
let rec y f x = f (y f) $$ x

With this y, even if you make the stack limit very low, your y (trace "fac called on" fac_) 5000000 shouldn’t run out of stack space.


sure: the general idea of cps is that all functions must be called with either a value or a function, so first we have to convert our functions:

(* original version *)
let rec fac x = if x <= 1 then 1 else x * fac (x - 1)

(* else branch is a complex expression, we should rearrange it a bit in its steps *)
(* also we rename infix functions *)
  let p = pred x in
  let n = fac p in
  let m = mul n x in
(* pseudo cps *)
  pred x  (fun p ->
  fac p   (fun n ->
  mul x n (fun m ->
(* this requires us to do the following conversions: *)
let mulK x y k = k (x * y)
let predK x k = k (x - 1)
let rec facK x k = if x <= 1 then k 1 else
  predK x  (fun p ->
  facK p   (fun n ->
  mulK x n (fun m ->
    k m

let fac x = facK x (fun x -> x)

but we’ll relax the requirement a bit and say stuff like arithmetic is more or less a value.
(this will make both our lives easier, trust me):

let rec facK x k = if x <= 1 then k 1 else
  facK (x-1) (fun n -> k (x * n))

now we want to “defunctionalize”, to recover a nicer representation (less closures!).
this would actually get us closer to the correct accumulator representation.
to defunctionalize, we collect our lambdas and pay attention to any “environment” passed to them:

k0: (fun x -> x) {}
k1: (fun n -> k (x * n)) {x, k}

(* environments *)
type cont = K0 | K1 of { x : int; k : cont }
(* notice that this is ≈ int list *)

(* recover the lambdas *)
let rec recover = function
  | K0 -> (fun x -> x)
  | K1 {x; k} -> (fun n -> (recover k) (x * n))

let rec facD x k = if x <= 1 then (recover k) x else
  facD (x-1) (K1{x; k})

let fac x = facD x K0

now let’s #trace our functions and see how it’ll build and eval the K’s for fac 3 (removing some of the noise)

facD 3 K0
facD 2 (K1 3 K0)
facD 1 (K1 2 K1 3 K0)
recover (K1 2 K1 3 K0) 1
recover (K1 3 K0) 2
recover K0 6
... return ...
- : int = 6

compare with the original non-tailrec fac:

fac <-- 3
fac <-- 2
fac <-- 1
fac --> 1
fac --> 2
fac --> 6
- : int = 6

we notice a few interesting things:

  • the cont, which we noticed is an int list is actually acting like a stack
    • and not any stack, it corresponds to the call stack.
  • we unfold this stack (aka list) as we count down the tail-calls to facD with decreasing x,
    and then we immediately fold it again with recover.
  • we can erase an unfold followed by a fold, and just accumulate iteratively

to be more clear:

facD 3 [] => facD 2 [3] => facD 1 [2; 3] => fold_left ( * ) 1 [2; 3] => 6
(* fuse unfold with fold *)
facD 3 1 => facD 2 3 => facD 1 6 => 6

and that’s how we arrive at the accumulator version:

let rec facA x a = if x <= 1 then a else facA (x-1) (x*a)

wait how does this work? it’s a deep handler that handles no effect? what happens to the stack with it around? does the stack just get allocated on the heap and that’s why it “never overflows”? or am I guessing wrong?

Whoa, thank you so much! I haven’t experimented with the 5.0 compiler yet, but I’m looking forward to trying this out. I kinda had a hunch there would be some cool effects system way of rigging up a constant stack space Y combinator, what with the effects system’s ability to mess with the memory stack.

Thank you for taking the time to go through this! Absolutely fascinating how you can derive the normal tail recursive definition from a CPS transform; I’ve never heard of that before.

just wanna applaud @hyphenrf 's exposition there. And add that I think everyone should learn how defunctionalization works – it’s amazing stuff. I know it from Reynold’s seminal paper “Definitional Interpreters for Higher-order programming languages” ( https://dl.acm.org/doi/10.1145/800194.805852 ) and from a Felleisen paper about the CEK-machine (which he derives from CPS, of course!) that he presented at an IFIP WG meeting. I wish I remembered the latter reference, sigh.

But Felleisen and his students and advisor (Dan Friedman) did a ton of work using defunctionalization as a tool to convert CPS to first-order programs. It’s great stuff, and very, very useful for working programmers who find themselves needing to write tail-recursive code for complex programs and wanting to not go higher-order (for which there are many, many good reasons).

1 Like

@yallop if I understand correctly, your solution works by installing a handler at each recursive call, which will segment the OCaml stack in separate fragments. But, if I remember correctly, there is no strong decision on how we should measure the stack in OCaml 5.0 to decide to fail with a Stack_overflow. Currently I guess we count each stack fragment separately, so the “measured stack size” of your solution remains small, but we have also discussed summing the size of each fragment, and if we change to do this your solution will not work anymore – users will have to increase the stack size limit again.

Context: The Stack_overflow error is useful in catching programming mistakes that result in infinite loops. Not failing with a stack-overflow tends to result in such faulty programs eating all memory and becoming unresponsive instead of failing early, which is a user-convenience bug. Currently (with separate sums for each fragment), buggy programs that use effect handlers will not fail with a Stack_overflow. It’s a balancing act of whether we want to easily catch silly mistakes, or we want to accept more programs (some of which are sensible, such as @yallop’s one above).
We consider tail calls as an “opt-out” of stack accounting (we expect users to write interesting programs with arbitrarily many tail calls), at the cost of accepting more infinite loops (but: they don’t eat all memory, so they are less inconvenient for users). The current accounting strategy handles effect handlers in a similar way, but it is unclear that this is a good idea – probably not.

See the discussion in What is a good default stack size for OCaml programs? · Issue #10795 · ocaml/ocaml · GitHub .



In the original paper, Felleisen did not derive the CEK machine from CPS and defunctionalization. Perhaps you are thinking about the following paper which studies systematic correspondences between abstract machines and evaluator functions via CPS and defunctionalization, including CEK: Mads Sig Ager, Dariusz Biernacki, Olivier Danvy, and Jan Midtgaard. A functional correspondence between evaluators and abstract machines. PPDP 2003.

The following paper helped popularize defunctionalization, in particular it advocated defunctionalization for reasoning about, analyzing, and transforming programs: Olivier Danvy and Lasse R. Nielsen. Defunctionalization at work. PPDP 2001.

For an overview of the topic, I recommend Danvy’s dissertation An Analytical Approach to Programs as Data Objects. University of Aarhus : BRICS Research Series, 2006.


Fascinating. I take it that would mean the “installing an effect handler at each recursive call” approach just uses a tiny (and tin-ily growing) but still not exactly constant amount of stack space?

The space usage of (non-tail-recursive) factorial using yallop’s combinator is linear, just like the standard version. It is not reduced compared to the standard version, in fact it probably has a noticeably higher constant factor due to the extra metadata of stack fragments. Under OCaml 5, neither versions (effect-handlers and direct) use the native/system stack, they use stack fragments allocated separately (as do all OCaml functions under OCaml 5, unlike OCaml 4). The naive version uses a single stack fragment that is repeatedly resized and becomes very large, when the handler version uses very many small fragments.

1 Like

Whatever it was I read, it was pre-1994, since that’s when I exited the field. I don’t have access to that IFIP WG paper anymore (I had it on-paper, sheesh). In any case, it’d good to know that this stuff has become well-worked-out by others, too.

Thank you for these pointers!

Do you have a simple example of how you can use a CPS transform to tail-recursiv-ize a function?

You can find a step by step explanation here.