The algorithm on that page is tail recursive, but not really “in CPS”. In CPS, by definition, all calls must be tail calls. So, in order to qualify, that f x must be tail position, which it is not. A more canonical CPS’d map is as follows (with f taken as being in CPS):
let rec map f xs k = match xs with
| [] -> k []
| x :: xs -> f x (fun x' -> map f xs (fun xs' -> k (x' :: xs')))
Or, with a nicety, define let (let@) = (@@), then you can write:
let rec map f xs k = match xs with
| [] -> k []
| x :: xs ->
let@ x' = f x in
let@ xs' = map f xs in
k (x' :: xs')
so it looks as if the CPS version simply trades cheap stack space (which might overflow for large lists) for chunky heap allocations
This is exactly right. Your stack frames become heap allocated continuation closures. If you want an illustrative example of this, you can defunctionalise a CPS’d algorithm and you’ll find the resultant inductive datatype has the shape of a stack (with relevant free variables being carried around).
E.g. consider a factorial function in CPS, with k monomorphised (int -> 'a)['a := int]:
let rec fact n k = match n with
| 0 -> k 1
| n -> fact (n - 1) (fun n' -> k (n * n'))
let go n = fact n (fun n -> n)
We see the continuation (fun n' -> k (n * n')) : int -> int has the free variables { k : int -> int, n : int }. We see our other continuation (fun n -> n) : int -> int (with no free varibles). We’ll invent a datatype to represent functions of type int -> int. This will look like this:
type clos =
| C0 (* fun n -> n *)
| C1 of clos * int (* fun n' -> k (n' * n) *)
Now, without spelling everything out, we replace these higher-order functions with instances of these datatypes, hoist the bodies into a dispatch routine, and replace calls to int -> int functions with calls to this dispatch routine:
type clos =
| C0 (* fun n -> n *)
| C1 of clos * int (* fun n' -> k (n' * n) *)
let rec dispatch c arg = match c with
| C0 -> arg (* fun n -> n *)
| C1 (k, n) -> dispatch k (arg * n) (* fun n' -> k (n' * n) *)
and fact n k = match n with
| 0 -> dispatch k 1
| n -> fact (n - 1) (C1 (k, n))
and go n = fact n C0
let () = Printf.printf "fact(5) = %d\n" (go 5)
As you can see, clos is effectively a linked list carrying integers. This may seem kind of contrived, but it’s a good program transformation to know (see Danvy’s Defunctionalization at Work); I’ve converted a CPS’d OCaml algorithm into C doing this a few times.
It used to be somewhat common to rewrite OCaml functions into CPS to avoid the stack. However, as of OCaml 5, the stack is actually heap-allocated growable fibers (connected by parent pointers - i.e. a cactus stack). So, you’re far less likely to overflow the “stack” in OCaml 5. Still, CPS has actual algorithmic utility (e.g. see the classic A-Normal Form and CPS conversion algorithms, which are effectively in CPS). It goes without saying that monadic style is a generalisation of CPS, whereby the dispatching of the continuation (or not) is controlled by a guard (bind operator). So, it’s excellent background and really ought to be taught first.
As for compilers (the topic of that article you linked), you may be interested in Andrew Kennedy’s paper Compiling with Continuations, Continued which spells out some of the benefits of a CPS IR in a compiler. It has somewhat fallen out of fashion as a representation to lower faithfully (as in Appel’s “Compiling with Continuations”), but has garnered some recent interest as a mid-level IR for optimisation.