A tail-recursive List.flatten in Stdlib?

I was perusing the OCaml source code for Stdlib.List and noticed that List.flatten is implemented in a non-tail-recursive manner.

let rec flatten = function
    [] -> []
  | l::r -> l @ flatten r

Is there a compelling reason not to implement this using tail_mod_cons to make this function tail-recursive?

It doesn’t seem like a very difficult change to make.

let[@tail_mod_cons] rec flatten = function
  | [] -> []
  | []::xs -> flatten xs
  | [x]::xs -> x :: flatten xs
  | (x::x'::xs)::ys -> x :: x' :: flatten (xs::ys)
1 Like

The definition you propose can be simplified to:

let[@tail_mod_cons] rec flatten = function
  | [] -> []
  | []::xs -> flatten xs
  | (x::xs)::ys -> x :: flatten (xs::ys)

This function performs twice as many allocations as the total number of elements (because each element gives rise to two allocations, x :: and xs ::). Hence it performs twice as many allocations as the (non-tail-recursive) version found in the standard library. It is better than the naïve tail-recursive function using an accumulator which performs three times as many allocations as the total number of elements, but it will still lose against the non-tail-recursive version for most practical inputs.

Cheers,
Nicolas

That definition can be transformed to avoid the extra allocation by specializing the call flatten (xs :: ys) via a function flatten_cons that takes xs and ys separately, like this:

let[@tail_mod_cons] rec flatten = function
  | [] -> []
  | []::xs -> flatten xs
  | (x::xs)::ys -> x :: flatten_cons xs ys
and[@tail_mod_cons] flatten_cons xs ys =
  match xs with
  | [] -> flatten ys
  | x::xs -> x :: flatten_cons xs ys

The paper Tail Modulo Cons (by @let-def, @Elarnon and @gasche) gives a similar implementation, but obtained by inlining append in the non-tail-recursive version, as does the manual.

3 Likes

A bit simpler and fixing the xs that should be ys

let[@tail_mod_cons] rec flatten = function
  | [] -> []
  | xs::ys -> flatten_cons xs ys
and[@tail_mod_cons] flatten_cons xs ys =
  match xs with
  | [] -> flatten ys
  | x::xs -> x :: flatten_cons xs ys
1 Like

Note that, if you do not also annotate flatten with [@tail_mod_cons], then flatten_cons will be calling a non-specialized function ([] -> flatten ys), so the optimization cannot trigger. Both functions need to be annotated.

1 Like

The rule we followed so far in the stdlib is to convert functions to use TMC on-demand (no big change where we reviewed all trmc-able functions), and to perform a micro-benchmark to check that the TMC definition is as fast as the non-tail-rec one, including on small lists. In our experience this generally requires some manual unrolling of the TMC version (one or two steps of unrolling is often enough). See for example TRMC implementation of @ by yallop · Pull Request #11859 · ocaml/ocaml · GitHub , which moved @ to be tail-recursive.

Anyone should feel free to propose a TMC version of List.flatten, as long as they follow the same policy: write a micro-benchmark to check the impact on small inputs.

4 Likes

Having done some testing on speed the impact seems to range from 8-20% slower with TMC using ocamlc. Results with ocamlopt were worse seeing ~40-50+% longer execution times.

I cheerfully withdraw my suggestion in the face of pragmatism.

Note that there are some GC effects related to the write barrier that need to be taken into consideration when benchmarking; see the discussion at

Cheers,
Nicolas