Good code patterns for sharing algorithms on slightly different types

I’m struggling to find a good code pattern to share algorithms code across slightly different types. For example, a library for sets and maps using a similar tree structure:

type set = Leaf | Node of key \* set \* set
type 'a map = Leaf | Node of key \* 'a \* 'a map \* 'a map

(the real example is the patricia-tree library for one, but I also have other use cases in mind, such as a union-find library with multiple variants, which may-or-may-not include a rank or size field depending on union strategy).

Because most functions (find, iter, map…) heavily and recursively pattern match on both types, a lot of logic can be shared, but doing so is non-trivial. Here is a short list of what came to mind:

  • Duplicate the code, which is a non-solution, but is what Stdlib does for Set and Map. I specifically want to avoid this as shared code is easier to maintain.

  • Implement sets as unit maps directly, which I want to avoid for memory use reason.

  • Use meta-programming (i.e. with CPPO or similar tools). This is probably the solution with the best “performance” which avoids duplication, but it is very heavy to use (every pattern matching branch is wrapped in macro calls…). It also breaks the LSP, which does not help with maintainability.

  • Use a view type: instead on matching on the stored type x, match on view x where view casts to a common type (id for maps, cast to a unit map for sets), and define all the shared code in a functor that takes the type and view as a parameter. This is the solution we use in patricia-tree. It allows great code sharing without storing the extra unit value for sets. However, it noticeably degrades performance for sets, because of the extra allocations needed. Unfortunately, I’ve been unable to force inlining of the view which may alleviate this problem, because the view function is a functor parameter.

  • Use a view matching: essentially the same as above, but instead of casting to a common type, each implementation would define their own pattern matching function with the same interface: view_match: leaf:(unit -> 'a) -> node:(key -> 'b -> 'b t -> 'b t -> 'a) -> 'a. This avoids the problem of allocating the intermediate type, but is very heavy to use, and makes things like pattern matching on pairs hard.

I’m wondering if anyone can propose alternative ways (or improvements to those I mentioned here, such as how I could force inline the `view` calls).

This seems like a good potential use case for the new Module-dependent functions introduced in OCaml 5.5. You would need to define some common module type interface still. I don’t know if it would improve your performance concerns but it would probably be more syntactically lightweight than the view type solution. Parameterize the view function over a common module type implemented by both modules.

Alternatively, perhaps this is a good case for objects? These come with their own performance concerns but I don’t think memory usage is their main issue. If you need incremental refinement and polymorphism for free, objects are a good choice. I think OCamlers are sometimes too reticent to reach for objects. Objects are fun!

You would still need to define some kind of interface type, and I don’t believe object methods can inline, but you would likely avoid extra allocations.

I hadn’t considered using these, thanks for mentioning them.

  • Module dependent functions sound interesting, they seem to essentially be a lightweight functor with statically known arguments, so could maybe force inlining this way. I’ll admit I’m a bit confused by them as the syntax looks identical to functions taking first-class modules as arguments, which are not statically know and so cannot be inlined. Also, from past experience, function with first-class module arguments ant parameterized types like 'a t do not play well together…
  • You could absolutely do this with objects, by having the view functor parameter be a method, but I don’t see how that would help. Dynamic dispatch would solve code duplication in the binary as well as the source, as now each function only has a single implementation which dynamically selects the view to call, but the instantiating the functor twice works just as well. I don’t care about code duplication in the binary, in fact I prefer it if it is more efficient. For my view inlining problem: its theoretically possible for the compiler to realize the functor call is static, and therefore inline it and simplify its internals. It would not be able to do that with objects.

The difference between the module-dependent function and plain first-class modules is that the return type of the module-dependent function can vary according to the type provided by the module argument. I think that is precisely to help with stuff like parameterized types not previously playing well together with first-class modules.

I’ve had similar issues with sharing algorithms before, and I don’t think there’s a perfect solution (high performance + high abstraction usually gives you very wordy code or a very complicated build, using code generators, etc)

However, for performance, the main help is to ensure that you exclusively use compilation with flambda (preferably -O3). Mostly, deep abstraction can be worked around by aggressive inlining, and classic compilation is unable to inline certain calls, even if you try to force it. Flambda really comes to the rescue. (you do have to be careful not to have a code explosion, so it’s prudent to examine the output, normally by looking at the asm of a compiled binary).

I would generally attempt to use functors (for example, a general fold functor that could be used to implement all fold-like functions), and make sure to not allocate closures when it can be avoided. It’s generally easier to avoid extra indirect function calls (through caml_apply) by encapsulating your fixed logic in a functor.

e.g. I have a similar library where I implement mem/find_opt/find with a functor like this, which, with flambda, and correct [@inline always] annotations, gives performance identical to hand-written functions. user in this case is an additional argument threaded to found/missing that can be used to pass something without allocating a closure.

  module Mem = M.Make_find(struct
      type 'a user = unit
      type 'a return = bool
      let found _ _ _ = true
      let missing _ _ = false
    end)

(See here for a partial example: proof_balanced/bcaml/triple.ml at f3e89c4e603716f8446aa2947714ad1c3fb3dd40 · smuenzel/proof_balanced · GitHub)

In addition, encoding your data types with some phantom types and using GADTs appropriately can help reducing the burden on performance and annoying code repetition.

type 'a node_kind = | Leaf : [ `Leaf ] node_kind | Node : [ `Node ] node_kind

You can match on this type, and then you are able to access elements of the node without re-matching, by having node specific accessors

Your main data type should be a packed existential for this to work:

type ('a, 'v) node =
| Leaf : ([ `Leaf], _) node
| Node : { left : 'v t; key : Key.y; value : 'v; right : 'v t } -> (['Node], 'v) node
and
'v t = T : (_, 'v) node -> 'v t [@@unboxed] 

You have no overhead if you use [@@unboxed]. See here for a partial example: proof_balanced/bcaml/triple.ml at f3e89c4e603716f8446aa2947714ad1c3fb3dd40 · smuenzel/proof_balanced · GitHub

In the end, you probably have to combine several techniques to make this work in a nice way.

One final remark: Prepare some benchmarks. Performance can be very non-intuitive!!!

A good example of this is your Leaf/Empty constructor – in my own tree, I had to replace it:

type t = | Empty | ...
(* turned into *)
type t = | Empty of unit | ...

Which massively improved performance on certain matches (calculating the weight of the tree), since each match lost an is_int check, and was then simplified to an arithmetic function.

The packed existential is a great idea. I have also used this pattern.

As I understand it module dependent functions allow other arguments or the return value of the function to be deduced by reference to the first class module argument. Take the example in the manual:

let sort (module MSet : Set.S) li =
  MSet.elements (List.fold_right MSet.add li MSet.empty)

With OCaml-5.5 this will compile and have its type deduced as (module MSet : Set.S) -> MSet.elt list -> MSet.elt listand successfully compile. OCaml 5.4 will have a stab at compiling that function by reference to a notional type of (module Set.S) -> 'a -> 'a and fail with:

2 |   MSet.elements (List.fold_right MSet.add li MSet.empty)
                                              ^^
Error: This expression has type 'a but an expression was expected of type
         MSet.elt list
       The type constructor MSet.elt would escape its scope

Yes, however the set example is a poor highlight of this because you could already do this by introducing an explicit type variable:

let sort (type a) (module MSet : Set.S with type elt = a) li =
  MSet.elements (List.fold_right MSet.add li MSet.empty)

However, you can now also do this with parameterized types, as Josef_Throrne mentionned, for which the explicit variable trick does not work:

let map_sum (module M: Map.S) f m =
    M.fold (fun k v s -> f k v + s) m 0

Here the function has type (module M : Map.S) -> (M.key -> 'a -> int) -> 'a M.t -> int, which you cannot do with explicit type variables (you can’t have a type 'a t = v constraint to module type signatures).

So if I understand this right, module-dependent functions is just a new name for the modular explicits, which I’ve seen discussed quite a bit before. Its great to finally have them!

Good point about the example given in the manual. Here is an entirely trivial function without parameterized types counting the number of items in a set which does not appear to compile with OCaml 5.4 (locally abstract types are I think not going to help), and so requires OCaml 5.5:

let count (module M: Set.S) set =
  M.fold (fun _ acc -> acc + 1) set 0 

Aren’t modular implicits intended to enable the compiler to pick first class modules for you by reference to the type of other arguments? (I’ve not been following the discussion.) Edit: Ah, I see you are referring to modular explicits. Those I haven’t come across.

This example works fine with slightly more complex encoding of first-class modules

module type Set_val = sig
   module Set : Set.S
   val x: Set.t
end
let count (module Sv: Set_val) =
  Sv.Set.fold (fun _ acc -> acc + 1) Sv.x 0

And in general module-dependent functions can be encoded as first-class modules
(at worse using a first-class functor), the complexity of the encoding often makes this quite unpractical however.

let count (type s) (module M: Set.S with type t=s) set =
  M.fold (fun _ acc -> acc + 1) set 0

?
(works with 4.14)

You can also still use locally abstract types here as illustrated above. But you can’t do that for maps (they key difference is the parameterized 'a t type, which you cannot constrain in this way, since you can’t have locally abstract parameterized types). Although I suppose the wrapping module trick also works.

For your second comment, there is a difference between modular explicits and modular implicits, the former is essentially module dependent function, which are required for implicits. Implicits would allow omitting the module argument and use typeclass-like resolution to pick the correct implementation based on the types of other parameters.

This is tricky for libraries:

  • If you add ocaml-option-flambda as a dependency, then it can be annoyingly restrictive to users, especially if they don’t happen to care about performance or use parts of the library where the flambda optimizations are not necessary.
  • If it’s not a dependency, then it’s very easy for users to use it in a non-flambda switch and have worse performance than expected or advertised, especially if the library is a transitive dependency.

For this use case, it’d be very nice if flambda wasn’t such a global choice but controllable by compiler options as well. Then the flambda-desiring library could ask itself to be compiled with flambda while the rest could be non-flambda. But I guess there’s a good reason flambda and non-flambda can’t be mixed like this.

Personally, I think we should just remove classic mode, and force everyone to use flambda. It’s basically just better – and compilation performance is not too much worse (and maybe possible to improve)

The restrictions you outline really make it difficult to make high-performance libraries with sufficient abstraction to be used by the community in general.

This makes a lot of sense to me. The default in C/C++ is not -O0, and I don’t think we should want that to be the default in OCaml either. We don’t need to ‘force’ anyone, just to change the default.

Also, it seems like we’ve really turned a page with ‘modular explicits’. It would be nice to have some blogs and such going into what’s both possible and optimal with the new design space.