Functorial vs "comparator" container interfaces?

I wonder if there has been any performance evaluation, of e.g. Set and Map, that compares the functorial (Stdlib Make functor style) versus “comparator” (Base-style) interfaces? To be concrete, suppose we define a Make functor over the tree data structure underlying the Base/Core Map module as below to obtain a type whose values are just trees and the compare function is obtained from the functor argument rather than the individual values.

Is it immediately obvious that this functorial interface should have the same performance as the comparator-style interface exposed by Base by default? I am thinking about e.g. the compiler’s ability to inline compare functions in the two cases. With the comparator interface, IIUC there is only one copy of the compiled code for e.g. Map, and furthermore the compare function needs to be read out of a record field each time the external interface boundary is crossed. So it seems “unlikely” that the compare functions can be inlined. On the other hand, with the functorial interface, is there at least the chance that the compare function could be inlined?

There is also a question about whether it matters that there is an “extra” indirection every time the external interface is crossed (due to the toplevel record containing the compare closure and tree itself). For instance, naively it seems that implementing Map.of_alist outside of Map would need to do perhaps-meaningfully more work than defining it inside Map. Has this sort of thing been evaluated and understood to be a non-issue?

I’m mostly wondering if anyone has expectations based on understanding of the compiler, or has been learned already from experiments with recent compilers (i.e. flambda).

For other context regarding functorial interfaces, there is definitely a point that favors the comparator-based interface where it can avoid compiled code duplication via a quadratic (number data structures times number of key types) number of functor applications for e.g. String.Map modules for each key type like String and data structure like Map.

But, IIUC there are several drawbacks of the comparator-based interface, independent of performance considerations. Comparators are values, in particular records with a field of function type, which means that:

  1. Marshaling values whose representations involve a comparator-based container will require enabling support for closures, which is slower and much more stringent with respect to interoperability. In use cases I commonly see, there are data structures with a lot of implicit sharing where the structure of the sharing is irregular, and so making it explicit in order to use a serialization system that does not support sharing well would be very invasive at the source code level, and a lot of work.
  2. Comparators are extremely inconvenient when used in recursive modules, since the key module will need a comparator, and thereby violate the recursive module safety check. For example, code like an analogue of the manual’s example of a tree with nodes that have sets of children becomes intricate and requires duplicating the signature numerous times.

Thoughts? Am I the only one who would rather keep the functor-style APIs (in cases where the number of functor applications does not explode problematically)?

Functorial interface to Core.Map:

open Core

module Make (Key : sig
  type t [@@deriving compare, sexp_of]
end) : sig
  type key = Key.t

  module Key : sig
    type t = key

    include Comparator.S with type t := t

  include Core_kernel.Map_intf.Make_S_plain_tree(Key).S

  val compare : ('a -> 'a -> int) -> 'a t -> 'a t -> int
end = struct
  module KeyMap = Core.Map.Make_plain (Key)
  module Key = KeyMap.Key

  type key = Key.t

  include KeyMap.Tree

  let compare = compare_direct

I personally prefer comparator-based Map/Set APIs over functor-style APIs, because the latter requires extra boilerplate to set up, and then extra verbosity (using a module name I have to come up with) at every use site. I don’t have any experience with using recursive modules, which may be core to your question.

I haven’t looked into which style is better optimized by the OCaml compiler, but in principle I think they are both typically amenable to inlining. In the comparator case, “constant propagation” of the comparator is enough for global optimization to recognize that the comparator can be inlined into a type-specialized map/set function.