Mutable sorted data structure

Hi

I’m looking for a data structure (ideally in Core or Base) that is both mutable and sorted. The requirement is that I need to quickly and often update the structure and also be able to cheaply provide an immutable sorted list containing the items.

I haven’t been able to find anything that satisfies these requirements. For now I’m using a Hash_heap but the act of producing a sorted list even of just a few thousand items is too slow, in the order of 10ms. Ideally it should be under 1ms.

Thanks
Ryan

1 Like

These are some pretty tough requirements, specifically needing both sorting and the ability to iterate through the sorting backwards to create the immutable list.

My question is, is the immutable list creation time dominated by creation/allocation or by lookups in the hash_heap? If it’s allocation, there’s nothing you can do short of creating it optimally (end to beginning). If it’s lookup time, you can roll out your own version of a skip list, making sure the lowest list is double-linked. This will allow you to go back-to-front when creating the immutable list.

Maybe you’ve already considered and rejected, but … how about

('a, 'b) Gmap ref ? That is, an AVL tree (sorted, log-time insert/delete/query) and immutable. But you wrap it in a ref, you can mutate, and still get an immutable copy. Then you can fold it right-to-left to get a list, or (heck) maybe modify your consuming code to use iterators/folds (“catamorphisms” … ha, I sound like I care about category theory!!) and then not bother constructing the list?

If you don’t absolutely need a list, you can consider a data structure that supports an O(1) “freeze” operation of some kind. Something like this:

(* Container.mli *)

type elt = string
type t

(* O(log n) insertion/replacement *)
val add : t -> elt -> unit

(* Immutable, sorted container. *)
module Snapshot : sig
  type t
  val iter : (elt -> unit) -> t -> unit
end

(* O(1) export into an immutable, sorted container. *)
val snap : t -> Snapshot.t
(* Container.ml *)

type elt = string
module Frozen = Set.Make (String)

type t = Frozen.t ref

let add container v =
  container := Frozen.add v !container

let snap container = !container

module Snapshot = struct
  type nonrec t = Frozen.t

  let iter = Frozen.iter
end

(it’s basically the same as suggested by Chet)

Agreed. Mutability should never really be a requirement, since you can always stick an immutable data structure in a ref. Does updating a Base.Map.t in a ref not meet your performance requirements? It seems like the simplest way of doing what you want.

y

1 Like

Thanks for all the replies. I’m busy refactoring my code to make it easier to change the underlying data structure and I’ll update here once I’ve tried some of the options mentioned above.

To be clear, being mutable is not a hard requirement. I assumed, perhaps incorrectly, that frequent updates to a mutable structure would be faster than an immutable one when holding a few thousand items.

You can typically get better constant factors when adding/removing element from a mutable bag (compared to an immutable one), but the price that you have to pay is that freezing (getting the current elements as an immutable collection to use later) is more expensive, as you have to copy the datastructure instead of just sharing it (in the immutable version).

Some data-structures (there are hashmap versions of this) work hard to get the best of both world by generally exposing an immutable interface, with a “transient” mode where you can do a batch of updates imperatively before freezing again, provided that the intermediate states are linear / uniquely owned / not shared with others.

This requires a good bit of engineering work to get right; if you need very fast “freeze” I would also first try the much simpler “a reference on a good immutable structure” first.

You can’t have both O(1) insertion and O(1) freezing, because that would mean you can sort arbitrary data in linear time. So the O(log n) insertion and O(1) freezing of an immutable data structure is the best you can do in theory.

2 Likes

Thanks all for your suggestions. I’m now using an immutable Set (from Core) and performance has improved significantly.

Timings went from around 5ms to 1ms in my use case.

2 Likes