Functorial vs "comparator" container interfaces?

I wonder if there has been any performance evaluation, of e.g. Set and Map, that compares the functorial (Stdlib Make functor style) versus “comparator” (Base-style) interfaces? To be concrete, suppose we define a Make functor over the tree data structure underlying the Base/Core Map module as below to obtain a type whose values are just trees and the compare function is obtained from the functor argument rather than the individual values.

Is it immediately obvious that this functorial interface should have the same performance as the comparator-style interface exposed by Base by default? I am thinking about e.g. the compiler’s ability to inline compare functions in the two cases. With the comparator interface, IIUC there is only one copy of the compiled code for e.g. Map, and furthermore the compare function needs to be read out of a record field each time the external interface boundary is crossed. So it seems “unlikely” that the compare functions can be inlined. On the other hand, with the functorial interface, is there at least the chance that the compare function could be inlined?

There is also a question about whether it matters that there is an “extra” indirection every time the external interface is crossed (due to the toplevel record containing the compare closure and tree itself). For instance, naively it seems that implementing Map.of_alist outside of Map would need to do perhaps-meaningfully more work than defining it inside Map. Has this sort of thing been evaluated and understood to be a non-issue?

I’m mostly wondering if anyone has expectations based on understanding of the compiler, or has been learned already from experiments with recent compilers (i.e. flambda).

For other context regarding functorial interfaces, there is definitely a point that favors the comparator-based interface where it can avoid compiled code duplication via a quadratic (number data structures times number of key types) number of functor applications for e.g. String.Map modules for each key type like String and data structure like Map.

But, IIUC there are several drawbacks of the comparator-based interface, independent of performance considerations. Comparators are values, in particular records with a field of function type, which means that:

  1. Marshaling values whose representations involve a comparator-based container will require enabling support for closures, which is slower and much more stringent with respect to interoperability. In use cases I commonly see, there are data structures with a lot of implicit sharing where the structure of the sharing is irregular, and so making it explicit in order to use a serialization system that does not support sharing well would be very invasive at the source code level, and a lot of work.
  2. Comparators are extremely inconvenient when used in recursive modules, since the key module will need a comparator, and thereby violate the recursive module safety check. For example, code like an analogue of the manual’s example of a tree with nodes that have sets of children becomes intricate and requires duplicating the signature numerous times.

Thoughts? Am I the only one who would rather keep the functor-style APIs (in cases where the number of functor applications does not explode problematically)?

Functorial interface to Core.Map:

open Core

module Make (Key : sig
  type t [@@deriving compare, sexp_of]
end) : sig
  type key = Key.t

  module Key : sig
    type t = key

    include Comparator.S with type t := t
  end

  include Core_kernel.Map_intf.Make_S_plain_tree(Key).S

  val compare : ('a -> 'a -> int) -> 'a t -> 'a t -> int
end = struct
  module KeyMap = Core.Map.Make_plain (Key)
  module Key = KeyMap.Key

  type key = Key.t

  include KeyMap.Tree

  let compare = compare_direct
end
8 Likes

I personally prefer comparator-based Map/Set APIs over functor-style APIs, because the latter requires extra boilerplate to set up, and then extra verbosity (using a module name I have to come up with) at every use site. I don’t have any experience with using recursive modules, which may be core to your question.

I haven’t looked into which style is better optimized by the OCaml compiler, but in principle I think they are both typically amenable to inlining. In the comparator case, “constant propagation” of the comparator is enough for global optimization to recognize that the comparator can be inlined into a type-specialized map/set function.

I don’t have any answers about inlining, but I’ve got two more points to add to design space and I have a small feature difference to point out.

There’s another approach, let’s call it a “phantom comparator” approach.
This is when the comparator appears in the type of the map, but doesn’t actually have a runtime representation, so you don’t have a per-map memory overhead.
This is implemented by Base.Map.Tree module. (which Base.Map is built on top of)

The annoying thing here is that a comparator needs to be passed to all accessors.

That is what’s done in Haskell as well. They use type classes instead of a phantom comparator type, but otherwise it’s equivalent. (type classes also make it much less annoying to pass the comparator dictionary everywhere)

And there’s a fourth approach, let’s call it an “OOP” approach.
You can represent a map with a dictionary that has many fields (one field per supported operation).
This is an approach we use with the Stdune.Table type in dune (https://github.com/ocaml/dune/blob/master/src/stdune/table.ml), and I’m somewhat skeptical about performance, but I’m kind of curious how bad it really is.

As for the feature difference, one major advantage of “comparator” and “phantom comparator” approaches is that it’s easy to write new generic functions that work for any type of map and make assumptions about the two maps using the same order. For example, a type like this can be written down without specializing to a specific key type:

val union : ('k, 'v, 'cmp) Map.t -> ('k, 'v, 'cmp) Map.t -> ('k, 'v, 'cmp) Map.t

For the functor-based approach, I think this might also work, but I’m not 100% sure and it’s definitely not as convenient, so I’ve never seen something like this done:

module Implement_union (K : Map_key) : sig
  val union : 'v Map.Make(K).t -> 'v Map.Make(K).t  -> 'v Map.Make(K).t 
end

For the OOP approach that simply can not work.

8 Likes

Very good point about ease of writing generic operations. A related one is how the comparator approach enables light-weight implementation of “algorithms” that use e.g. a Set, without needing to fix which Set data structure to use, and without needing to “functorize the world” so to speak.

I don’t have a good understanding for how often being able to make assumptions on the order is useful. My unscientific feel is that in most of the cases I have come across it is also necessary to have access to the underlying representation as well. For example, to implement a single-traversal combined find and remove operation. This sort of thing is likely to be sensitive to the application domain though.

I’m also curious about the performance of the “OOP” approach. I don’t have any experience with anything similar.

I don’t understand why you’re saying the comparator approach lets you abstract over the data structure. I think what distinguishes it from OOP approach is that the data structure is fixed and it’s only the element comparison function that’s different between sets.

For example, if you have a function that takes a ('k, 'v, 'c) Base.Set.t, you can only give that function an AVL tree with a specific representation Base.Set defines and not anything else.

1 Like

We did such micro benchmarking. We compared the standard library Map and Hash functors, with Base’s comparator-style maps and hashtables, and the same structures that were created using the Core functors.

We were using OCaml 4.09.0, Core v0.12.0, and tried three different compilation options, (1) without any optimization, (2) with -O3 and (3) with -O3 and flambda switch.

The main takeaways. The standard library Maps and Hashtables are consistently slightly faster (~10%) than their Janestreet counterparts without flambda and are significantly faster when flambda is enabled (nearly twice as faster).

The experiment

We did a very simple test (that was representative for our problem) that inserts N integers into the map, then lookups the same integers. Each test case is represented as <Source>.<Structure>.<Size>, where <Source> is either Caml (the standard library), Base the Base library, or Core the functor interface, e.g., Map.Make(Int) (we also tried the Map.Int module but didn’t find a significant difference with Map.Make(Int)).

Here are the numbers:

1) vanilla compiler, no specific optimization options.

┌────────────────────┬─────────────┬───────────────┬───────────────┬───────────────┬────────────┐
│ Name               │    Time/Run │       mWd/Run │      mjWd/Run │      Prom/Run │ Percentage │
├────────────────────┼─────────────┼───────────────┼───────────────┼───────────────┼────────────┤
│ Caml.Map:16        │      2.52us │       371.02w │               │               │            │
│ Caml.Map:256       │     90.51us │    12_737.58w │        42.08w │        42.08w │      0.10% │
│ Caml.Map:65536     │ 74_726.08us │ 6_440_940.17w │ 1_733_335.64w │ 1_733_335.64w │     80.61% │
│ Base.Map:16        │      2.39us │       563.03w │         0.13w │         0.13w │            │
│ Base.Map:256       │     91.12us │    19_004.83w │        53.00w │        53.00w │      0.10% │
│ Base.Map:65536     │ 82_342.61us │ 9_626_675.28w │ 1_786_030.13w │ 1_786_030.13w │     88.82% │
│ Core.Map:16        │      2.78us │       557.03w │         0.14w │         0.14w │            │
│ Core.Map:256       │    103.61us │    18_998.83w │        53.22w │        53.22w │      0.11% │
│ Core.Map:65536     │ 92_704.59us │ 9_626_671.81w │ 1_784_322.55w │ 1_784_322.55w │    100.00% │
│ Caml.Hashtbl:16    │      2.01us │       177.01w │               │               │            │
│ Caml.Hashtbl:256   │     43.79us │     2_815.08w │        12.40w │        12.40w │      0.05% │
│ Caml.Hashtbl:65536 │ 17_783.56us │   590_934.33w │   392_225.13w │   262_163.13w │     19.18% │
│ Base.Hashtbl:16    │      2.16us │       108.00w │               │               │            │
│ Base.Hashtbl:256   │     50.56us │     2_547.07w │        16.21w │        16.21w │      0.05% │
│ Base.Hashtbl:65536 │ 21_912.52us │   522_319.67w │   651_433.70w │   520_865.70w │     23.64% │
│ Core.Hashtbl:16    │      2.15us │       100.00w │               │               │            │
│ Core.Hashtbl:256   │     51.14us │     2_539.07w │        16.84w │        16.84w │      0.06% │
│ Core.Hashtbl:65536 │ 22_198.09us │   522_311.89w │   651_436.44w │   520_868.44w │     23.94% │
└────────────────────┴─────────────┴───────────────┴───────────────┴───────────────┴────────────┘

2) vanilla compiler, -O3 option

┌────────────────────┬─────────────┬───────────────┬───────────────┬───────────────┬────────────┐
│ Name               │    Time/Run │       mWd/Run │      mjWd/Run │      Prom/Run │ Percentage │
├────────────────────┼─────────────┼───────────────┼───────────────┼───────────────┼────────────┤
│ Caml.Map:16        │      2.63us │       371.02w │               │               │            │
│ Caml.Map:256       │     95.79us │    12_737.58w │        41.88w │        41.88w │      0.11% │
│ Caml.Map:65536     │ 77_185.38us │ 6_440_940.17w │ 1_733_335.64w │ 1_733_335.64w │     89.80% │
│ Base.Map:16        │      2.44us │       563.03w │         0.13w │         0.13w │            │
│ Base.Map:256       │     90.29us │    19_004.83w │        52.90w │        52.90w │      0.11% │
│ Base.Map:65536     │ 81_299.52us │ 9_626_675.08w │ 1_786_544.39w │ 1_786_544.39w │     94.59% │
│ Core.Map:16        │      2.69us │       557.03w │         0.14w │         0.14w │            │
│ Core.Map:256       │    101.39us │    18_998.83w │        52.94w │        52.94w │      0.12% │
│ Core.Map:65536     │ 85_951.43us │ 9_626_672.41w │ 1_782_706.67w │ 1_782_706.67w │    100.00% │
│ Caml.Hashtbl:16    │      1.95us │       177.01w │               │               │            │
│ Caml.Hashtbl:256   │     43.34us │     2_815.08w │        12.42w │        12.42w │      0.05% │
│ Caml.Hashtbl:65536 │ 18_014.40us │   590_933.91w │   392_225.34w │   262_163.34w │     20.96% │
│ Base.Hashtbl:16    │      2.16us │       108.00w │               │               │            │
│ Base.Hashtbl:256   │     50.72us │     2_547.07w │        16.13w │        16.13w │      0.06% │
│ Base.Hashtbl:65536 │ 22_109.95us │   522_319.57w │   651_427.30w │   520_859.30w │     25.72% │
│ Core.Hashtbl:16    │      2.14us │       100.00w │               │               │            │
│ Core.Hashtbl:256   │     49.79us │     2_539.07w │        16.84w │        16.84w │      0.06% │
│ Core.Hashtbl:65536 │ 21_654.90us │   522_311.75w │   651_438.06w │   520_870.06w │     25.19% │
└────────────────────┴─────────────┴───────────────┴───────────────┴───────────────┴────────────┘

3) flambda compiler, -O3
┌────────────────────┬─────────────────┬───────────────┬───────────────┬───────────────┬────────────┐
│ Name               │        Time/Run │       mWd/Run │      mjWd/Run │      Prom/Run │ Percentage │
├────────────────────┼─────────────────┼───────────────┼───────────────┼───────────────┼────────────┤
│ Caml.Map:16        │        882.79ns │       360.02w │               │               │            │
│ Caml.Map:256       │     39_562.08ns │    12_726.58w │        41.68w │        41.68w │      0.05% │
│ Caml.Map:65536     │ 53_809_874.70ns │ 6_440_930.79w │ 1_735_117.84w │ 1_735_117.84w │     64.61% │
│ Base.Map:16        │      1_815.46ns │       546.03w │         0.12w │         0.12w │            │
│ Base.Map:256       │     79_654.15ns │    18_987.81w │        53.00w │        53.00w │      0.10% │
│ Base.Map:65536     │ 79_160_991.15ns │ 9_626_659.73w │ 1_787_850.52w │ 1_787_850.52w │     95.04% │
│ Core.Map:16        │      2_102.74ns │       546.03w │         0.12w │         0.12w │            │
│ Core.Map:256       │     90_244.58ns │    18_987.81w │        52.78w │        52.78w │      0.11% │
│ Core.Map:65536     │ 83_290_490.19ns │ 9_626_659.10w │ 1_786_810.45w │ 1_786_810.45w │    100.00% │
│ Caml.Hashtbl:16    │        457.38ns │       150.00w │               │               │            │
│ Caml.Hashtbl:256   │     13_812.19ns │     2_548.06w │        11.49w │        11.49w │      0.02% │
│ Caml.Hashtbl:65536 │ 10_470_655.56ns │   525_380.79w │   392_212.18w │   262_150.18w │     12.57% │
│ Base.Hashtbl:16    │      1_513.58ns │        89.00w │               │               │            │
│ Base.Hashtbl:256   │     40_135.58ns │     2_528.09w │        15.58w │        15.58w │      0.05% │
│ Base.Hashtbl:65536 │ 19_374_446.48ns │   522_301.02w │   650_583.30w │   520_015.30w │     23.26% │
│ Core.Hashtbl:16    │      1_534.87ns │        89.00w │               │               │            │
│ Core.Hashtbl:256   │     40_766.55ns │     2_528.09w │        15.58w │        15.58w │      0.05% │
│ Core.Hashtbl:65536 │ 19_377_124.52ns │   522_300.77w │   650_544.29w │   519_976.29w │     23.26% │
└────────────────────┴─────────────────┴───────────────┴───────────────┴───────────────┴────────────┘

Conclusions

As we can see, with flambda both Map and Hash tables of the standard library are significantly faster, 53 ms vs 92 ms for maps and 10 ms vs 19 ms for Hashtbls. We can also see that Janestreet times didn’t change between flambda and non-flambda version hence the increased difference. We can’t really tell whether it is because flambda has more opportunities for inlining for functors or because the flambda version of the JS library was compiled without the optimization information, while the standard library build system is somehow specialized for flambda. (It is important to notice, that when an flambda switch is used but -ON is not specified with N>=1 the generated code doesn’t differ in performance from the code generated by the normal compiler, therefore it is usually a good idea to set -O2 by default when packaging your project).

Source code

open! Core
open Core_bench

let benchmark = Bench.Test.create_indexed ~args:[ 16; 256; 65536 ]

module type Ints = sig
  type t
  val empty : unit -> t

  val add : int -> t -> t
  val mem : int -> t -> bool
end

module type Tests = sig
  type  t
  val create : int array -> t
  val count : int array -> t -> int
end

module MakeTest(Map : Ints) : Tests = struct
  type t = Map.t
  let create inputs =
    let init = Map.empty () in
    Array.fold inputs ~init ~f:(fun map x -> Map.add x map)

  let count inputs map =
    Array.fold inputs ~init:0 ~f:(fun sum x ->
        if Map.mem x map then sum + 1 else sum)
end

module SM = MakeTest(struct
    module Map = Caml.Map.Make(Int)
    type t = unit Map.t
    let empty () = Map.empty
    let add k m = Map.add k () m
    let mem = Map.mem
  end)

module CM = MakeTest(struct
    module Map = Map.Make(Int)
    type t = unit Map.t
    let empty () = Map.empty
    let add k m = Map.set ~key:k ~data:() m
    let mem k m = Map.mem m k
  end)

module BM = MakeTest(struct
    type t = unit Base.Map.M(Int).t
    let empty () = Base.Map.empty (module Int)
    let add k m = Base.Map.set ~key:k ~data:() m
    let mem k m = Map.mem m k
  end)

module BH = MakeTest(struct
    type t = unit Base.Hashtbl.M(Int).t
    let empty () = Base.Hashtbl.create ~size:16 (module Int)
    let add k m = Base.Hashtbl.set m ~key:k ~data:(); m
    let mem k m = Base.Hashtbl.mem m k
  end)

module SH = MakeTest(struct
    module H = Caml.Hashtbl.Make(Int)
    type t = unit H.t
    let empty () = H.create 16
    let add k m = H.add m k (); m
    let mem k m = H.mem m k
  end)

module CH = MakeTest(struct
    module H = Hashtbl.Make(Int)
    type t = unit H.t
    let empty () = H.create ~size:16 ()
    let add k m = H.set m k (); m
    let mem k m = H.mem m k
  end)

let (>:) name (module Test : Tests) =
  benchmark ~name @@ fun len ->
  let input = Array.init len ~f:ident in
  let random_state = Base.Random.State.make [|len|] in
  Array.permute ~random_state input;
  stage (fun () ->
      let map = Test.create input in
      assert (Test.count input map = Array.length input))


let () =
  Command.run @@ Bench.make_command [
    "Caml.Map" >: (module SM);
    "Base.Map" >: (module BM);
    "Core.Map" >: (module CM);
    "Caml.Hashtbl" >: (module SH);
    "Base.Hashtbl" >: (module BH);
    "Core.Hashtbl" >: (module CH);
  ]

Compiled with

ocamlfind ocamlopt -O3 -thread -package core_bench bench.ml -linkpkg -o bench
8 Likes

I did a few similar micro benchmark around the same time and noticed similarly that functor heavy libraries were significantly sped up by flambda (my own pet projects as well as OCamlGraph). I cannot remember the numbers but I kept a +20% sticky note in my mind.
Indeed, without flambda the compiler cannot inline across functors, but some libraries add functors over functors over functors, which looks like a good desing until you notice the performence hit.
Since then, I compile all my opam switches with +flambda and stop worrying.

2 Likes

I recently built a set benchmarking program which provides evidence strongly suggesting that Base would also speed up substantially with flambda, were it compiled with -O3.

Background: I’m implementing a bootstrap runtime library for Hemlock, a language similar to OCaml, with Base as a strong inspiration for API design (thanks, Jane Street!). As such, Hemlock’s Basis.Ordset module has a quite similar API to Base.Set. Basis.Ordset remains significantly slower than Stdlib.Set and Base.Set, due mainly to ephemeral tuple allocation, and to a lesser extent due to larger nodes.

The bench_set program provides a command line interface for benchmarking numerous set operations. For the following results, I chose the parameters which most closely mimic those you reported. The important thing to note is that although Base.Set speeds up only incrementally with flambda, Basis.Ordset speeds up dramatically. Basis is compiled with -O3 in both cases (‘release’ Dune build profile); were Base also compiled with -O3, I would expect similar results.

As a general remark regarding Stdlib.Set vs Base.Set performance, my ad hoc exploration with bench_set gave me the overall impression that they perform quite similarly, one being slightly faster than the other depending on the particular operation(s) being benchmarked.

4.10.0:

je@maple:~/projects/bench_set [master]> ./_build/default/bin/bench_set.exe -a_n 65536 -a_max 65535 -insert_order incr -ops insert,mem -set_reps 250 -op_reps 65536 -impl stdlib
ns/set_rep: 31_847_053 31847053
alloc_volume/set_rep: 47_197_808 47197808
ns/op_rep: 485 485
alloc_volume/op_rep: 720 720

je@maple:~/projects/bench_set [master]> ./_build/default/bin/bench_set.exe -a_n 65536 -a_max 65535 -insert_order incr -ops insert,mem -set_reps 250 -op_reps 65536 -impl base
ns/set_rep: 36_783_222 36783222
alloc_volume/set_rep: 57_683_336 57683336
ns/op_rep: 561 561
alloc_volume/op_rep: 880 880

je@maple:~/projects/bench_set [master]> ./_build/default/bin/bench_set.exe -a_n 65536 -a_max 65535 -insert_order incr -ops insert,mem -set_reps 250 -op_reps 65536 -impl ordset                                                                                                                                                        
ns/set_rep: 79_569_859 79569859
alloc_volume/set_rep: 125_581_152 125581152
ns/op_rep: 1_214 1214
alloc_volume/op_rep: 1_916 1916

4.10.0+flambda:

je@maple:~/projects/bench_set [master]> ./_build/default/bin/bench_set.exe -a_n 65536 -a_max 65535 -insert_order incr -ops insert,mem -set_reps 250 -op_reps 65536 -impl stdlib
ns/set_rep: 27_283_521 27283521
alloc_volume/set_rep: 47_197_576 47197576
ns/op_rep: 416 416
alloc_volume/op_rep: 720 720

je@maple:~/projects/bench_set [master]> ./_build/default/bin/bench_set.exe -a_n 65536 -a_max 65535 -insert_order incr -ops insert,mem -set_reps 250 -op_reps 65536 -impl base
ns/set_rep: 34_964_646 34964646
alloc_volume/set_rep: 57_681_488 57681488
ns/op_rep: 533 533
alloc_volume/op_rep: 880 880

je@maple:~/projects/bench_set [master]> ./_build/default/bin/bench_set.exe -a_n 65536 -a_max 65535 -insert_order incr -ops insert,mem -set_reps 250 -op_reps 65536 -impl ordset
ns/set_rep: 52_957_685 52957685
alloc_volume/set_rep: 124_788_048 124788048
ns/op_rep: 808 808
alloc_volume/op_rep: 1_904 1904
2 Likes

Yep, and this brings us to another serious point. The whole work done by the flambda team is essentially unused since when you install from opam most if not all packages are not using the -O option. So we, package maintainers, should probably set at least -O2 by default. I already started to do this on mine packages (at least on 4.07 this option is accepted even when flambda is disabled). Moreover, I was surprised (pleasantly) to learn that -O2 gives an about +10% performance boost even with disabled flambda.

Otherwise, yes, I would expect that flambda optimizes more or less equally well functor applications and function applications. Both are the same from the inside and yes, functor is having better chances because of different scoring, but otherwise the could be treated equally.

8 Likes

We should also consider changing the default in Dune. At least for the release profile.

8 Likes

You’re right, I was being too optimistic. I think one could define some other module with the same interface as Base.Set but using a different internal representation. But now that you point it out, I don’t see how to define generic clients.

@ivg, Thank you very much, that is very useful! For completeness, I ran your experiment also using the code I mentioned above (called BaseTree.Map below) to provide a functorial interface over the comparator-less tree structure underlying Base.Map. I set OCAMLPARAM=_,O3=1 and used the 4.10.0+flambda opam compiler, with base 0.13.1 and core 0.13. The results from two machines I have easy access to are:

┌────────────────────┬─────────────────┬───────────────┬───────────────┬───────────────┬────────────┐
│ Name               │        Time/Run │       mWd/Run │      mjWd/Run │      Prom/Run │ Percentage │
├────────────────────┼─────────────────┼───────────────┼───────────────┼───────────────┼────────────┤
│ Caml.Map:16        │        352.24ns │       360.00w │               │               │            │
│ Caml.Map:256       │     22_069.68ns │    12_726.00w │        40.86w │        40.86w │      0.04% │
│ Caml.Map:65536     │ 39_337_683.36ns │ 6_440_616.00w │ 1_737_219.37w │ 1_737_219.37w │     72.58% │
│ BaseTree.Map:16    │        548.08ns │       487.00w │               │               │            │
│ BaseTree.Map:256   │     35_686.96ns │    17_968.00w │        47.13w │        47.13w │      0.07% │
│ BaseTree.Map:65536 │ 49_333_277.10ns │ 9_364_108.00w │ 1_771_943.12w │ 1_771_943.12w │     91.02% │
│ Base.Map:16        │        639.68ns │       546.00w │         0.12w │         0.12w │            │
│ Base.Map:256       │     40_222.41ns │    18_987.00w │        52.95w │        52.95w │      0.07% │
│ Base.Map:65536     │ 52_605_064.54ns │ 9_626_247.00w │ 1_786_918.02w │ 1_786_918.02w │     97.06% │
│ Core.Map:16        │        662.05ns │       546.00w │         0.12w │         0.12w │            │
│ Core.Map:256       │     40_965.57ns │    18_987.00w │        52.79w │        52.79w │      0.08% │
│ Core.Map:65536     │ 54_197_739.94ns │ 9_626_247.00w │ 1_784_075.72w │ 1_784_075.72w │    100.00% │
└────────────────────┴─────────────────┴───────────────┴───────────────┴───────────────┴────────────┘

┌────────────────────┬─────────────────┬───────────────┬───────────────┬───────────────┬────────────┐
│ Name               │        Time/Run │       mWd/Run │      mjWd/Run │      Prom/Run │ Percentage │
├────────────────────┼─────────────────┼───────────────┼───────────────┼───────────────┼────────────┤
│ Caml.Map:16        │        644.17ns │       360.00w │               │               │            │
│ Caml.Map:256       │     34_163.62ns │    12_726.00w │        40.75w │        40.75w │      0.04% │
│ Caml.Map:65536     │ 64_069_232.50ns │ 6_440_616.00w │ 1_735_785.83w │ 1_735_785.83w │     76.37% │
│ BaseTree.Map:16    │        900.92ns │       487.00w │               │               │            │
│ BaseTree.Map:256   │     55_146.99ns │    17_968.00w │        47.17w │        47.17w │      0.07% │
│ BaseTree.Map:65536 │ 77_656_013.72ns │ 9_364_108.00w │ 1_772_474.74w │ 1_772_474.74w │     92.57% │
│ Base.Map:16        │      1_051.02ns │       546.00w │         0.12w │         0.12w │            │
│ Base.Map:256       │     62_036.59ns │    18_987.00w │        52.76w │        52.76w │      0.07% │
│ Base.Map:65536     │ 81_870_699.73ns │ 9_626_247.00w │ 1_784_777.00w │ 1_784_777.00w │     97.59% │
│ Core.Map:16        │      1_049.71ns │       546.00w │         0.12w │         0.12w │            │
│ Core.Map:256       │     61_292.51ns │    18_987.00w │        52.67w │        52.67w │      0.07% │
│ Core.Map:65536     │ 83_889_366.91ns │ 9_626_247.00w │ 1_783_918.55w │ 1_783_918.55w │    100.00% │
└────────────────────┴─────────────────┴───────────────┴───────────────┴───────────────┴────────────┘

So, using the code above is a little faster than Base, but still substantially slower than the Stdlib. Looks like I’ll be writing a bunch of interface glue code soon.

1 Like

Am I the only one who prefers functors for stylistic reasons?

The comparator thing is a cool trick and makes the code look a little prettier, but I kind of like the explicitness that the functor approach forces. I think OCaml’s explicitness about polymorphism is one of its best features.

I also find that, while the .ml files using the comparator method are slightly cleaner (don’t have to “functorize” everything), the type signatures have a bit more cognitive overhead for me to parse. I’m generally a huge fan of the stylistic conventions used in Jane Street libraries, but functors seem so much more… “OCaml-y” than comparators.

Of course, I use the comparator types when I’m using JS Base, but I sometimes opt out of using it for small programs because I prefer functors (well, and I like faster compile times)

7 Likes