@toolslive i wouldn’t propose to merge this map
into the stdlib, as it will likely break all code that depends on the order in which f
is called.
However, yes, for other, non-higher-order functions, or where the evaluation order would be the same, there may be no harm in unrolling the tail-recursive portion in some crazy way, like done here. I haven’t looked at them in detail, though, just done some mental sketches of whether it would work, and what the performance effect is likely to be. Some of them have pretty different allocation needs.
Even with that, I think an obstacle would be justifying architecture-specific unrolling in what is meant to be a portable stdlib. I optimized and measured this map
for x86_64, but there are other platforms, bytecode, and all the JavaScript VMs. I don’t know what the best choice for those platforms is.
It seems like these kinds of unrolled functions are much more likely to go into community libraries or applications, which have an easier time supporting a much more restricted set of targets.
@domsj Thanks Yeah, the actual code is a bit complex. I’m not sure how easy this would be to generate by automatic optimization, though.
There is (was?) a proposal to have the compiler automatically generate what is effectively the Batteries approach. This is tail recursion modulo constructors by @let-def: https://github.com/ocaml/ocaml/pull/181. I’m not sure what the exact performance implications would be, or the status of the pull request. I actually wanted to install a compiler with the patch applied, and include TRMC-d List.map
in the benchmarking, but eventually ran out of time and didn’t do it.