Detecting identity functions in Flambda


There is our new blogpost about detecting identity functions in Flambda written by Léo Boitel, who did an internship at OCamlPro directed by Vincent Laviron. He worked on the possibility of annotating functions with an attribute telling the compiler that the function should be trivial, and always return a value strictly equivalent to its argument. Here is his vision:

“OCaml’s strong typing system is one of its main perks: it allows to write safe code thanks to the abstraction it provides. Most of the basic design mistakes will directly result into a typing error and the user cannot mess up the memory as it is automatically handled by the compiler and runtime.
However, these perks keep a power-user from implementing some optimizations, in particular those linked to memory representation as they cannot be accessed directly.”


Very nice and interesting work! Congrats to Léo!

The paragraph about “higher-order capabilities” is a bit cryptic for me: is f recognized as an identity as soon as f is? And what about Set.Make(X).map and so on? I think about another approach: instead of requiring to write identity functions between two types t and u, we could allow (e : t :> u) as soon as we can recognize that t and u share the same memory representation (or, more generally, that every value of t has a memory representation that is also valid as a value of u). The benefit of coercion is that it naturally allows deep transformations, and I think that it is what @EduardoRFS suggested in Records, tuples and variants subtyping, but one of the difficulties is perhaps to recognize that the memory representations of t and u are compatible. (We can then mix the approaches by allowing coercions between t and u as soon as we are able to write an identity function between the two types.)

I cannot resist to share a related puzzle… You can check that there should be an identity function ('a, 'b, 'c) plus -> ('a, 'b, 'c) cursor between the types:

type zero = [`Zero] and 'a succ = [`Succ of 'a]

type ('a, 'b, 'c) plus =
  | PO : (zero, 'a, 'a) plus
  | PS : ('a, 'b, 'c) plus -> ('a succ, 'b, 'c succ) plus

type ('a, 'b, 'c) cursor =
  | CO : (zero, 'a, 'a) cursor
  | CS : ('a, 'b succ, 'c) cursor -> ('a succ, 'b, 'c) cursor

(You can check for instance that for all n >= 0 and for all types 'a, 'b, 'c, we have PS^n PO : ('a, 'b, 'c) plus if and only if CS^n CO : ('a, 'b, 'c) cursor.)

However, I have found very difficult to write such an identity function that will not be in quadratic time with respect to the size of its argument (I think I have a linear-time implementation, but it is incredibly convoluted, and in retrospect it could even have been a good candidate for I prefer not to share it right now not to spoil if someone wants to search a bit). I don’t know if there is some functions ('a, 'b, 'c) plus -> ('a, 'b, 'c) cursor for which the approach presented here can prove to be an identity.

1 Like

A significant part of the internship was spent looking at various simplifications of the problem, on simpler languages. Some of it later got adapted to the real OCaml use cases, but the “higher-order capabilities” part corresponds to an experiment where identity functions are detected (or inferred) during a type-checking pass. In the real OCaml language it’s not practical to detect identity functions during type-checking (too complicated), and not practical either to re-do a typing pass later in the compiler.

This is also a possible solution, but we chose to focus on post-type checking passes as I’m more familiar with them. We did consider the issue when looking at non-converging recursive functions, but we only focused on what would be correct if we had a way to determine if all values of a type are also valid in another type. We didn’t take any time to try to implement it. So the subtyping approach is still open for experimentation.