Use arrays or hash tables in a Map implementation?

I’m working on a trie implementation so I can lookup keys with small differences efficiently, and right now I’m trying to optimize the storage of each node in the trie. Since I’m much more concerned about read speed than write speed, I’d like to store each node as a hash table, and just replace the entire hash table every time I add a key. Unfortunately OCaml won’t even let me use a hash table (or an array) in the type:

Signature mismatch:
       ...
       Type declarations do not match:
         type 'a t = None | One of key * 'a | Many of (key * 'a) array
       is not included in
         type +'a t
       File "fast_char_trie.ml", line 5, characters 4-85: Actual declaration
       Their variances do not agree.

It seems like OCaml is angry because I’m trying to use a mutable type inside of a type that’s required to be immutable, but is there a way to tell it that I want to do that anyway, and I’ll jump through the hoops to make it look immutable from the outside?

The type +'a t includes a variance annotation and the compiler tells you that the inferred variance does not match the declared variance.

2 Likes

The issue here is that OCaml doesn’t know that you’re not exposing the imperative capabilities of the array, and so can’t expose the parameter as covariant. If you’re confident that you’re not exposing those capabilities, you can force the covariance in place using Obj.magic. This is actually done in Async, to establish the covariance of the type parameter of the Deferred.t.

Of course, Obj.magic is a tool to be used rarely and carefully, but this is an example of a reasonable usage, I believe.

y

1 Like

Ahh I see, I can declare my type to be covariant in the .mli, then in the .ml, declare a real type (which isn’t) and use Obj.magic to convert between the internal type and the public type. Thanks!

It pains me to see Obj.magic as the recommended solution… so I’ll make two additional suggestions.

1- Do you really need the type to be covariant? Variance annotations are useful for some programming styles that rely on OCaml’s subtyping, but a lot of programs work just fine without.

2- Instead of an array or hash table, could you use a purely functional data structure at each Many node? Binary search tree, etc.

1 Like

Removing the covariance annotation was what I did for testing, but I’d like to make this module have the same interface as Map (plus some extra functions).

Looking at my benchmarks, I suspect right now I’m limited by algorithmic problems (checking the same node way too many times), but I suspect that if I can make that work, I can get a little more speed out of this by using an array-based binary tree for each node (due to better cache usage). Since the whole point of this data structure is that I need it to be really, really fast, I think it’s a reasonable case for trading off some safety for speed.

It would be nice if there was a built-in immutable array data structure (for when setup time doesn’t matter but lookup speed does).

For what it’s worth, it’s not subtyping precisely that drove us to want to get the correct covariance annotation for deferreds; it was issues with the relaxed value restriction.

It would be nice if there was a reliable way to prove to OCaml that one is not using the imperative side of a data structure. It would be neat if you could break up an imperative data structure into two separate APIs, one read-only and one write-only, with the appropriate variances. But that does seem like a hard thing to add to the language now.

y

Amusingly, I’m pretty sure you could make that typecheck without any magic with an affine type system. Something like that (considering an affine kind system in the style of Alms):

module Split (M : sig type 'a t : unrestricted .... end) : sig
  type -'a w : affine
  type +'a r : unrestricted
  val freeze: 'a w -> 'a r
  val set : 'a w -> 'a -> 'a w
  val get : 'a r -> 'a
  val make : 'a -> 'a w
end
1 Like