Functor trouble -- How to organize functor-heavy code?

bnguyenvanyen · November 30, 2021, 9:57am

I have a library where functors have proliferated. It was quite painful to functorize a module deep in the stack, because of types not being equal anymore in many places.
Therefore I wonder if there is a better way to organize the code so that it would be less verbose and less painful to evolve.

To illustrate, let’s imagine we have module types A, B and C, then AB that depends on A and B, and ABC that depends on AB, A, B and C.
What should AB expose, and what should ABC take as input ? (and expose itself, for functors further down the line) ?

One possibility is to always pass everything as arguments, with all necessary with type constraints, and to expose as little as possible.
So you would have

module ABC (A : A) (B : B) (AB : AB with type a = A.t and type b = B.t) (C : C) = struct
  ...
end

But once you have many arguments with many types, this can become very verbose, and it’s difficult to get all the right equalities.

Another possibility would be for AB to re-expose A and B as submodules and only access them through there, so that you don’t need to pass them as arguments. Then when necessary you would use with module A = A which is less verbose if you have many types.
Something like:

module AB (A : A) (B : B) = struct
  module A = A
  module B = B
  ...
end

module ABC (AB : AB) = struct
  module AB = AB
  ...
end

Then when you need A, you do ABC.AB.A…
But I have the notion that this is not a great idea, as it’s easy to end up with multiple As that are not equal.

Yet another possibility, if you don’t actually want to parameterize on functor A B -> AB implementations,
is to only pass A and B, and derive AB in the body of the functor.

module ABC (A : A) (B : B) = struct
  module AB = AB (A) (B)
  ...
end

This looks like it should work well, but I have also ended up with type equality problems that I don’t really understand…

So, what’s the best approach ? What rules should one follow to make it all fit together nicely ? Is that even possible or is any refactoring necessarily painful ?
Thanks a lot,

octachron · November 30, 2021, 10:43am

A point that might help with the first option is that you can replace with constraint with an applicative path that compute the right type with a functor:

module ABT(A:A)(B:B) = struct
  module type T = AB with type a = A.t and type b = B.t
end
module ABC(A:A)(B:B)(AB: ABT(A)(B).T) = struct
   ...
end

The advantage is that the local length of the type of AB no longer grows with the number of equalities that you need to expose, and you only need to get the module type right once.

bnguyenvanyen · November 30, 2021, 1:47pm

Great advice, thank you ! Trying this pattern right now, and indeed it helps quite a bit.
However this does not totally answer what are the right rules to adopt.

For instance, if ABC depends explicitly on A, C and AB, but not on B (only through AB),
there is a choice to make between passing B as argument, to use AB : ABT (A) (B), or to not pass it, and writing the equations for this argument by hand.

In practice, the list of arguments can grow quite long, and many arguments deep in the stack might not be needed…

ivg · November 30, 2021, 3:46pm

Your code excellently exemplifies the weakness of functors. It is the common pitfall of using functors everywhere that leads to the hard to maintain and extremely hard to use code. Or, more shortly, to the overgeneralized code. Most of the time, you will find yourself applying the same functor to the same arguments. In fact, while powerful, functors are just an option for code parameterization, one of many, and most of the time not the best option. So my suggestion is to rethink your design and get rid of the functors in all places where they are not needed.

In OCaml, we have a few options when we need to parameterize our code. These options differ in their expressibility and usability. The art is to choose a mechanism that is expressible enough to solve the problem but that is the most lightweight tool in our box. Functors offer the most syntactically heavy and awkward to use abstraction but they enable us to parameterize our code over types. No matter how cool it sounds, in fact, it is a very rarely needed feature. As most of the time, we would like to abstract over some behavior, not over a representation. Even when we want to parameterize over the representation of the type, most of the time we can opt into making the representation itself abstract, rather than making our code an abstraction over that representation. This is especially true if we control the representation of that type, in other words, when we are not implementing a highly-parameterized library that provides generic algorithms that are independent of the data structures representations. And even in that case, we have some tradeoffs. Imagine that the implementors of the standard library of OCaml decided that instead of using a concrete type for the list, they will turn it into an abstraction,

module type list = sig 
  type 'a t 
  val (::) : 'a -> 'a t -> 'a t
  val hd : 'a t -> 'a
  val tl : 'a t ->'a t
end

So that every function that uses lists will now be a functor over a list representation. On one hand that would be cool, as we can now use any type that can implement the list signature as the representation of the list. We can even use arrays or maps or trees as lists without having to introduce intermediate data structures.

But such power comes with great responsibility, if not to say burden. We now need to parameterize all our code over the representation of the lists and we also need to decide at some time which representation our program will use. We would also need to eventually come up with some representation for a list. In general, it doesn’t sound that bad, especially if we would have type classes as in Haskell that make such kinds of abstractions rather lightweight. But OCaml functors are syntactically heavy-weight, have stronger typing discipline, and do not come with automatic instantiation machinery that creates instances on the fly (aka modular implicits).

In the end, we will find ourselves using the 'a list as our main abstraction and keep instantiating the list functor over it everywhere. Fortunately, the standard library developers didn’t make this error. They did choose functors for sets and maps though, which turned out in quite an awkward interface that makes many of us unhappy. That is why Janestreet made their own maps and sets that are polymorphic types, which are much easier to use.

Though, the standard library developers had their reasons to use functors for sets as they didn’t want to store the comparison function in the set representation, as this will prevent them from implementing set equality and will in general break structural comparison.

This illustrates that sometimes there are reasons to use functors, even when we are not parameterizing over types. Indeed, in OCaml’s vanilla maps and sets the representation is the same - an AVL tree type t = Empty | Node of elt * elt * int and it would be much nicer to use a parameterized type 'a t = Empty | Node of 'a * 'a * int, a representation which is truly polymorphic and that enable us to write generic code that will work equally well with maps and sets over all types of elements. Indeed, there is no real and fundamental reason that should prevent us from writing such code and this is why so many people are pissed off with OCaml’s maps and sets. They just can’t understand why we have different types for each set, and no polymorphic functions such as length, etc. And the main culprit here is the usage of functors! Functors have a very strong typing discipline that mandates every application of a functor to manifest a new type, even if has the same representation. This makes functors a very powerful device that enables us to express things like Girard paradox, a very questionable feature in the real life. But this fundamental approach makes functors so unusable in real life as you have to pay for the option that you don’t need. E.g., in OCaml’s sets, we wanted to parameterize over the behavior (the comparison function), but instead, we also got a new type for each set.

TL;DR; To summarize. OCaml functors are very heavy and should be used only when we need to abstract over type representation. Even for the type representation abstraction, there are other choices that should be considered to prevent premature generalization of the code. Even when functors are used, we should be careful to put the quantifier at the right place, i.e., not to abstract code that is independent over the abstracted entity.

bnguyenvanyen · December 1, 2021, 8:42am

Thank you for the great insights !
I agree, but one of the difficulties is that functors are a bit infectious. Once you step into functor world, everything wants to be a functor. Indeed the code I’m working on does use the Stdlib’s sets and maps, and I guess that’s one of the reasons that led to functorizing most of the code.

I’ll try to see if I can get rid of the functors, but that’s a big piece of work.

I still wonder if there are not usability improvements that would be possible for functors.
The biggest difficulty is not really the verbosity, but the fact that the compiler is not very helpful in showing where a with constraint is missing.
Maybe an optional warning could indicate when a type that is an alias to an input type is left abstract in the return type ?

module Make (X : sig type t ... end) : sig type x ... end = struct type x = X.t ... end
                                           ^^^^^^
Warning: Alias to input type X.t left abstract in the return type of the functor

(Of course, I have no idea whether that is doable or not…)

bnguyenvanyen · December 7, 2021, 3:12pm

So I have experimented with both approaches.

In one case, I added MakeT functors next to my Make functors to build module types with the right type equalities. This did make the code much more consistent, and less verbose, but it remained quite heavy.
I still kept all functor arguments to be safe, even those that could probably be removed.
I think once everything is correctly constrained with the MakeT functors, it would be easy to also remove them and further simplify the code.

In the other, I removed (nearly) every functor, since fundamentally they were used to build Set and Map instances. I used Base’s sets and maps, hiding the comparator type witness since I had no use for it. The result feels a lot simpler, especially for clients of the (previously) functorized library, so we will keep this version.

So thank you both @octachron and @ivg for teaching me new things and helping me write better code !

nico_toll · December 8, 2021, 12:58pm

Hi, I’ve just read your posts and it raises a few questions for me, I would be happy if you could answer them.

If I understand well, your point is that functors primarily allow us to asbtract over type representation and I’m not sure I agree with this. For me, functors are functions at type level from modules to modules, that is from {set of types and values} to {set of types and values}. It seems to me that the key point is that we can not only parameterize on types BUT also on values.

If we look at the Map example, the actual code of the Stdlib is actually closer to this in essence :

module type OrderedType = sig
type t
val compare: t -> t -> int
end

module Map(Ord: OrderedType) = struct
type key = Ord.t

type 'a t = 
 | Empty
 | Node of {left: 'a t; right : 'a t; v : key}
end

So the Map type is polymorphic with respect to the value type (but not the key type). I believe this differs from what you say, but I may have misunderstood something in your post.

The difference with a Map parameterized over both key and value types :

type ('key, 'val) t =
 | Empty
 | Node of {left: ('key, 'val) t ; right : ('key, 'val) t; v: 'key}

is that this type is actually unsound : two maps should be considered of the same type not only if their key and value types coincide, but if they also use the same comparison function (else a union between two maps that use different comparison functions would yield a non-sensical result). Also, in this case, we have to pass the comparison function each time we use the map type, which is unconvenient. That is, the map type is parameterized not only on two types key and val, but also on a value, namely the value of a function compare of type key → key → int. For me, this has pretty much nothing to do with type representation (but again, I may have misunderstood your point). And this is why a functor of type List does not make as much sense as a Map functor : there is no value we would want to parameterize over (sorry if I don’t use the right vocabulary).

If we look at Janestreet base stdlib, they actually take this into account but handles it differently, if I understand correctly by using a type-level witness that two sets actually uses the same comparator. I believe this solves the exact same problem. Of course, we have no way to know either at compile time or runtime if two comparison functions are equal, so the only way to know if two maps are of the same type is if they come from an application to the same module (syntactically).

There is another point to be made here, which is that in the case of map the comparison function does not matter much as long as everyone uses the same (or at least in most usecases I’m aware of). So either typeclass or implicit would considerably make them easier to use (by saying : if there is a default ordering available for this type, just use it implicitly). Nevertheless, I’m quite confident that in some other cases there are functors that take a couple (type, value) for which the value matters and can be different for the same type.

Again, I may have misunderstood some of your points, and I am not all an expert, so feel free (and anyone else) to correct me on any mistake I could have done. I also apologized if my point was not clear, I’ve done my best but again, I am not an expert on the matter.

ivg · December 8, 2021, 10:03pm

Functors are the most powerful abstraction mechanism in OCaml. This price for this power is the absence of type inference on the functor level and a rather heavy-weight syntax and typing rules. Together it makes it hard to scale functors on larger programs. So yes, definitely functors allow us to parameterize over behavior, but there are many other more lightweight options for that. So the main point is to use the least powerful tool in our arsenal.

Yep, a mistake in my post, I was actually referring to the set implementation. But the same is applicable to maps as well, having values doesn’t change anything. And I am not saying that using functors for maps and sets in OCaml is wrong or bad. I am actually giving them as an example when functors are justified for parameterization over behavior. At the same time, this example shows how hard it is to use such interfaces and how they affect the end-user code.

Probably you thought that I am against functors and especially against functors for Sets and Maps. That’s not true, though I still prefer maps and sets from Base, with polymorphic type and witnessed equality. My point was that there are plenty of other abstraction mechanisms that are much easier to use than functors and that the latter should be used only if nothing better works.

n4323 · December 10, 2021, 10:22am

I got into similar trouble with stacks of functors recently; in my case it was not obvious how/where to make sure that equality of types between different functor applications is known to the type system. (I asked in a thread on here as well).
What would be helpful for me for future projects would be some sort of a rule of thumb that lets me go through the alternatives to functors to see if they apply when programming. What are the alternatives? When to use each?
Is there such a resource somewhere?

ivg · December 10, 2021, 10:33pm

The main alternative is not to use functors Just use plain modules and abstract as much as possible. Think of OCaml compilation units as a built-in easy-to-use functor system. Indeed, any compilation unit in OCaml is an implicit functor parameterized by all modules that it references.

Suppose you have a module that requires a GPS service. The initial idea could be to use a functor, e.g.,

module type GPS = sig 
   type t 
   type position
   type time
   val create : string -> t
   val time : t -> time
   val position : t -> position
end

module Tracker(Gps : GPS) = struct 
   type point = {
     pos : Gps.position;
     time : Gps.time;
   }

   type t = {
     gps : Gps.t;
     track : point list
   }

   let next {gps} = {
     time = Gps.time gps;
     pos = Gps.position gps;
   }

   let step t = {
      t with track = next t :: t.track
   }
end

So far so good, and this would be probably the code that any novice in OCaml will write¹ just after he or she got introduced to functors. It is so powerful, we can implement a tracker without actually having any implementation of the GPS service. And we can easily test our tracker using a stub GPS, it is so nice and clean. Two weeks later, after fighting over tons of position is not equal to position errors, splitting position and time from GPS, and figuring out that create shall not be a part of the interface of the GPS service (different GPS services might have different ways of constructing) we might come to a conclusion that idea of using functors was not that great.

In the end, we had to accept that we need to use only one time and one position. And that we actually need to select the GPS service in runtime. So what options do we have?

The simplest option is just to use a closure, e.g.,

type gps = {
   time : unit -> time;
   position : unit -> position;
}

let stub = {
   time = (fun () -> Time.now ());
   position = (fun () -> Position.zero ());
}

let trimble port = 
   let dev = Trimble.connect port in {
   time = (fun () -> Trimble.get_time dev);
   position = (fun () -> Trimble.get_position dev);
}

But it was easy as all operations in the interface were simple accessors. What if we need to change the state of the GPS receiver. Suppose we want to be able to select which GNSS is used, e.g., we want an operation switch : system -> gps -> gps. We can use existentials for that!

type 'gps service = {
  self : 'gps;
  time : 'gps -> time;
  position : 'gps -> position;
  switch : 'gps -> system -> 'gps;
}
type gps = Gps : 'gps service  -> gps


let time (Gps {time; self}) = time self
let switch system = function Gps ({switch; self} as unit) ->
  Gps {unit with self = switch self system}

In fact, we can also go old-school and use objects, e.g.,

type gps = <
  time : time;
  position : position;
  switch : system -> gps
>

let stub : gps = object
  val system = Glonass
  method time = Time.now ()
  method position = Position.zero
  method switch system = {< system >}
end

But this is a completely different topic

If the interface of the GPS service becomes too complex we might decide to pack it into a first-class module and wrap into a GADT existential, e.g.,

module type GPS = sig
  type t
  val time : t -> time
  val position : t -> position
  val switch : system -> t -> t
end

type 'a service = (module GPS with type t = 'a)

type gps = GPS : {
    self : 'a;
    intf : 'a service;
  } -> gps


let switch system = function GPS {self; intf=(module GPS)} ->
  GPS {self = GPS.switch system self; intf=(module GPS)}

And we can hide the internal representation of the GPS as an existential in a single module and use dependency injection (possibly with plugins) to create various gps services so that in the end, our tracker is independent on time, location, and gps implementation, and is no longer required to be a functor.

Note, that if we would not need to select the GPS service in runtime, we could simply keep it as a compilation unit and use Dune variants. And if there is only one service, then obviously we could keep everything as compilation units and gradually upgrade our GPS service from a stub to something real.

¹⁾ I did

Kakadu · December 11, 2021, 12:25pm

I’m worried that after reading this thread potential OCaml wannabe may start thinking about trying F#/Purescript/Haskell where this monstrosity called functors doesn’t exist…

It would be great to finish thread on positive note.

cvine · December 11, 2021, 4:05pm

I am not sure about “monstrosity”? Similar things can (and are) said of compile-time generics for languages which support them, particularly with respect to for example template meta-programming in C++. When used judiciously they are fine. But if you base a design on them inappropriately, and later find that what you thought were things known at compile time end up having to be established dynamically at run time, or find that the sheer complexity becomes self-defeating, then you are going to have to do some recoding.

There is a lot to be said for the flexibility of using what amounts to proxy or traits objects implementing an interface in the way proposed: I thought it was well put in another article by the same author: The shape design problem - #39 by ivg . But I have never seen any problem with using functors for, for example, establishing the key type and comparator function for sets and maps in the Stdlib style[1]; and I have thought functors useful for implementing what amounts to simple type classes in ocaml, which may become more useful still if modular implicits were available.

[1] I know that the Jane Street equivalent passes the comparator in as a function argument so that the key type can be universally quantified, so avoiding functors, and instead enforces type discipline via a phantom comparator type, but as I have never used the Jane Street libraries I may well be missing out on some issue which led to his approach.

bluddy · December 11, 2021, 4:30pm

The main downside I’m aware of is that applying functors requires naming things. So you have IntMap and StringMap, but those names are purely user-dependent. What do you call a Map whose key is an (int * string) pair? An IntStringMap? An IntStringPairMap? You can see how it becomes a little hairy, but it honestly doesn’t bother me and I like the clean functorized way the stdlib does it (not to mention that it’s become a standard).

My general advice is to always keep hierarchies (of all sorts) as shallow as possible. Dune variants (which @ivg mentions above) are an excellent way to remove one of the applications of functors, which is selecting a compiled-in implementation of an abstract module. Class hierarchies can easily get messy and constraining in OOP. The same is true for functor hierarchies, and so they should be used sparingly. I’m aware of a company that essentially dumped OCaml because the engineers created such a deep hierarchy of functors that no part of the system was able to be modified sufficiently for their needs anymore. Deep hierarchies are generally a code smell and alternatives should be pursued.

thomas_ridge · December 11, 2021, 10:17pm

Functors and information hiding can result in very rigid code hierarchies where almost nothing can be achieved without a lot of effort. But the same goes for OO style class hierarchies as well. Perhaps the advice is: Make meaningful interfaces, but not too many.

mjambon · December 12, 2021, 3:00am

some solutions I’ve used successfully:

Alternative 1: attach the parameters to the data structure

For example, a set would be implemented as something like:

type 'a set = ... (* some tree *)

type 'a t = {
  compare: 'a -> 'a -> int;
  set: 'a set;
}

instead of just

(* 'compare' function is in the environment, brought over by
   the module passed as an argument to the functor *)
type t = ... (* some tree *)

The interface only exposes an abstract type 'a t:

type 'a t

Binary functions such as union : 'a t -> 'a t -> 'a t have to check that both sets use identical parameters e.g.

let union a b =
  assert (a.compare == b.compare); (* physical equality *)
  let compare = a.compare in
  ...

I don’t think it’s a big deal in general.

One downside is performance since each time a new set is created, a new record must be allocated.

Another downside is that interfaces become more complicated due to type parameters that wouldn’t be there after a functor application ('a Generic.t vs. Specialized.t), but it’s possible to create type aliases so that’s not really a problem. Of course, we don’t create a new module each time we want to use new parameters, so overall it reduces the total size of interfaces compared to the functor-based solution.

Alternative 2: pass parameters explicitly over and over

Unlike alternative 1 above, the data structure is not wrapped into a record that holds the parameters. The parameters have to be passed explicitly to each function provided by the module.

This should usually be better than alternative 1 in terms of performance. It may be a little more cumbersome to use due to passing the extra parameter. The extra parameter may be needed for just one uncommon function call, in which case it shouldn’t be a big deal.

Alternative 3: late initialization of global parameters

This is useful for things like a logger, which is global to the application and is not subject to change when testing and such.

let logger = ref None

let set_logger f =
  match !logger with
  | None -> logger := Some f
  | Some _ -> failwith "logger is already initialized"

let get_logger () =
  match !logger with
  | None -> failwith "no logger found: application must set one"
  | Some f -> f

let log msg = (get_logger ()) msg

A module that uses a logger can define functions that call log. No need for a functor to specify the logger.

lindig · December 12, 2021, 7:44pm

I think this is a good point. Heavily functorised code is created with the goal of flexibility and reusability but as you point out, the opposite may happen: it becomes a maze that is so rigid that this goal can’t be achieved any more. I am quite skeptical of deeply nested functors that have more module-level code than base-level code.

rand · December 13, 2021, 10:51am

I think it’s important to keep in mind that there isn’t “one solution to rule them all”. One should always weigh the qualities of each possible tool to solve the problem at hand. This enables one to write better code specialized to the problem - but also enables finding new usecases for the tools.

An example I found of the latter case, is to mix FRP with functors as first-class modules. This way I could start with writing straight forward toplevel reactive values separated out in ordinary modules - and when I suddenly needed to dynamically construct/destruct parts of the FRP graph, I could simply functorize my modules; replacing the reactive dependencies with functor-arguments.

n4323 · December 13, 2021, 12:47pm

Indeed. I’m currently finding myself needing a map with consecutive integer keys that can be updated efficiently. I was going to compare the performance of IntMap from containers-data and sek which both offer a superset of what my datastructure needs. So I wrote a signature TABLE and wrapper modules Table_IntMap and Table_Sek that adapt these libraries to TABLE. My application code is now functorized over an argument Table : TABLE. This seemed quite natural to me.

After reading this comment and others in the thread I might consider using dune ~~variants~~ virtual libraries instead, as the choice of table implementation is really only made once, after benchmarking. Later uses of the application most likely will not need the flexibility of changing implementations or using both concurrently.

bnguyenvanyen · December 14, 2021, 9:55am

If you’re just going to use both implementations temporarily for benchmarking, I would say you do not even need dune virtual libraries.
You can just write your two wrapper modules, then one module Table and use two branches. In one branch Table is include Table_IntMap and in the other include Table_Sek and then you do your comparison.

Topic		Replies	Views
Inline module expr Ecosystem compiler	15	1678	November 4, 2020
Help me to understand functors Learning functor	23	6387	September 13, 2018
Functorial vs "comparator" container interfaces? Ecosystem core , base , core_kernel	12	2341	April 16, 2020
Functors for dependency injection? Learning testing	37	4869	May 26, 2020
Modular Implicits Ecosystem compiler , faq	63	23764	October 24, 2022

Functor trouble -- How to organize functor-heavy code?

Alternative 1: attach the parameters to the data structure

Alternative 2: pass parameters explicitly over and over

Alternative 3: late initialization of global parameters

Related topics