A tail-recursive List.flatten in Stdlib?

Chris1 · October 3, 2024, 8:11pm

I was perusing the OCaml source code for Stdlib.List and noticed that List.flatten is implemented in a non-tail-recursive manner.

let rec flatten = function
    [] -> []
  | l::r -> l @ flatten r

Is there a compelling reason not to implement this using tail_mod_cons to make this function tail-recursive?

It doesn’t seem like a very difficult change to make.

let[@tail_mod_cons] rec flatten = function
  | [] -> []
  | []::xs -> flatten xs
  | [x]::xs -> x :: flatten xs
  | (x::x'::xs)::ys -> x :: x' :: flatten (xs::ys)

nojb · October 4, 2024, 4:40am

The definition you propose can be simplified to:

let[@tail_mod_cons] rec flatten = function
  | [] -> []
  | []::xs -> flatten xs
  | (x::xs)::ys -> x :: flatten (xs::ys)

This function performs twice as many allocations as the total number of elements (because each element gives rise to two allocations, x :: and xs ::). Hence it performs twice as many allocations as the (non-tail-recursive) version found in the standard library. It is better than the naïve tail-recursive function using an accumulator which performs three times as many allocations as the total number of elements, but it will still lose against the non-tail-recursive version for most practical inputs.

Cheers,
Nicolas

yallop · October 4, 2024, 6:17am

That definition can be transformed to avoid the extra allocation by specializing the call flatten (xs :: ys) via a function flatten_cons that takes xs and ys separately, like this:

let[@tail_mod_cons] rec flatten = function
  | [] -> []
  | []::xs -> flatten xs
  | (x::xs)::ys -> x :: flatten_cons xs ys
and[@tail_mod_cons] flatten_cons xs ys =
  match xs with
  | [] -> flatten ys
  | x::xs -> x :: flatten_cons xs ys

The paper Tail Modulo Cons (by @let-def, @Elarnon and @gasche) gives a similar implementation, but obtained by inlining append in the non-tail-recursive version, as does the manual.

craff · October 4, 2024, 6:46am

A bit simpler and fixing the xs that should be ys

let[@tail_mod_cons] rec flatten = function
  | [] -> []
  | xs::ys -> flatten_cons xs ys
and[@tail_mod_cons] flatten_cons xs ys =
  match xs with
  | [] -> flatten ys
  | x::xs -> x :: flatten_cons xs ys

silene · October 4, 2024, 7:07am

Note that, if you do not also annotate flatten with [@tail_mod_cons], then flatten_cons will be calling a non-specialized function ([] -> flatten ys), so the optimization cannot trigger. Both functions need to be annotated.

gasche · October 4, 2024, 7:28am

The rule we followed so far in the stdlib is to convert functions to use TMC on-demand (no big change where we reviewed all trmc-able functions), and to perform a micro-benchmark to check that the TMC definition is as fast as the non-tail-rec one, including on small lists. In our experience this generally requires some manual unrolling of the TMC version (one or two steps of unrolling is often enough). See for example TRMC implementation of @ by yallop · Pull Request #11859 · ocaml/ocaml · GitHub , which moved @ to be tail-recursive.

Anyone should feel free to propose a TMC version of List.flatten, as long as they follow the same policy: write a micro-benchmark to check the impact on small inputs.

Chris1 · October 6, 2024, 6:16pm

Having done some testing on speed the impact seems to range from 8-20% slower with TMC using ocamlc. Results with ocamlopt were worse seeing ~40-50+% longer execution times.

I cheerfully withdraw my suggestion in the face of pragmatism.

nojb · October 6, 2024, 8:05pm

Note that there are some GC effects related to the write barrier that need to be taken into consideration when benchmarking; see the discussion at

github.com/ocaml/ocaml

Comment by gasche - Make List.{map,mapi,map2} TRMC

ocaml:trunk ← nojb:list_trmc

There are interesting GC effects related to the write barrier, that create a per…formance disadvantage for TRMC when: - you `ignore` the result of List.map right away instead of keeping it alive until the next minor GC - the allocation work of the List.map size is comparable to the minor heap size This may very well be what you observe in your benchmark. I would try writing the result of `List.map` to a global reference instead of `ignore`-ing it right away. (This corresponds to the case where the lifetime of the result is extended until at least the next minor collection, which is a reasonable assumption.) For more details, see the discussion of "Promotion" in [the second TRMC PR](https://github.com/ocaml/ocaml/pull/9760).

Cheers,
Nicolas

Topic		Replies	Views
List reversal in tail-recursive functions? Learning	6	2258	March 16, 2023
Rewrite standard library recursive functions as loops? Learning recursion	14	1901	September 1, 2017
Tail recursion in OCaml, constant stack space Learning	3	1837	July 6, 2021
Do people actually use fold_left and fold_right in practice? Learning	6	3158	October 11, 2019
Stack overflow during evaluation (looping recursion?) Learning	16	1789	April 17, 2023

A tail-recursive List.flatten in Stdlib?

Related topics