Standardized mutable data structures

I wondered repeatedly, when you wrote “O(1) insertion and deletion given a reference to a node” if what you meant was, “given a reference to a node, insert a new node ahead of or behind this node, or delete this node”. In which case, hashtbl is not what you want, that’s fer sure.

OK, I get your point :wink:

If I am interpreting your question as a more meta-level question of ‘Can we have a mutable doubly linked list in the OCaml standard library? And if not, why not?’…then how about opening a GitHub ticket in the OCaml repo with a concrete suggestion? The maintainers have added lots of new modules and functionality to the Stdlib in recent years, your idea may well end up in there too!

A doubly linked list is a zipper of a list (original Gérard Huet’s paper about zipper data structure), so I don’t see why you want a mutable data structure for this.

The idea behind this data structure is really a zipper of your coat (hence its name): when you want to open or close your coat, you move its zipper (a cursor) up and down. Its type is the following:

type 'a zipper = {left : 'a list: right : 'a list}

This way the list [1; 2; 3; 4] when the cursor is at the beginning will be represented by the zipper {left = []; right = [1; 2; 3; 4]} . Each time you move to the right, you pop the value from the right list and push it on the left (same when you move to the left). This way the same list as before, when the cursor is on 3, will be represented by the zipper {left = [2; 1]; right = [3; 4]}.

An elementary implementation of basic function is as follow:

type 'a zipper = {left = []; right = []}

let of_list l = {left = []; right = l}
let rec to_list {left; right} = match left with
  | [] -> right
  | x :: left -> to_list {left; right = x :: right}

let cursor z = match z.right with [] -> None | x :: _ -> Some x;;

let go_left z = match z.left with
  | [] -> z
  | x :: left -> {left; right = x :: z.right}

let go_right z = match z.right with
  | [] -> z
  | x :: right -> { left = x :: z.left; right}

(* insertion is O(1) *)
let insert a z = {z with right = a :: z.right}

(* deletion is O(1) *)
let delete z = match z.right with
  | [] -> z
  | x :: right -> {z with right}

Some use cases:

of_list [1; 2; 3; 4] |> go_right |> go_right;;
- : int zipper = {left = [2; 1]; right = [3; 4]}

of_list [1; 2; 3; 4] |> go_right |> go_right |> cursor;;
- : int option = Some 3

of_list [1; 2; 3; 4] |> go_right |> go_right |> insert 5 |> to_list;;
- : int list = [1; 2; 5; 3; 4]

of_list [1; 2; 3; 4] |> go_right |> go_right |> delete |> to_list;;
- : int list = [1; 2; 4]

You would want a mutable double linked list for speed and allocation rate, where it makes sense.

That said I’ve always used them as intrusive structures: the pointers are part of some record that also contains the useful data. This gives trivially access to the list node from the data it contains, so you can add/remove just by swapping pointers. A generic version is less useful in my eyes.

2 Likes

That is only true if you never have more than one reference into a list. As soon as you have more than one, the zipper approach breaks down. For example, it is impossible to write an O(1) splice function using zippers, while it is trivial on a doubly-linked list. Mutable data structures are sometimes useful.

4 Likes

I understand why you sometimes want mutable data structure (mostly for efficiency reason), but I was more astonished by the fact of having double linked list mutable in the standard lib, since the single linked version in OCaml is immutable.

Nevertheless, if such data structures have to exist, I’ll be more confident if they were in the standard lib. Look at your PR for dynarray: it’s here for efficiency, but for safety reason we need a type isomorphic to 'a option (this one is not a good choice for efficiency reason, you don’t want to box the value in the underlying array), and so @chambart made a PR to have an equivalent to null pointer in a “safe” way. And when I read the discussion on your PR, I see it’s not obvious to write memory safe code with mutability even for persons deeply aware of the memory model of OCaml.

1 Like

Yes, Dynarray is a particularly gnarly structure because it contains a partially uninitialized array. That’s really stuff for the Higher OCaml Runtime Wizards. Doubly linked lists are really trivial in comparison :slight_smile:

2 Likes

One slight disadvantage with adding something like a doubly linked list (that doesn’t require new language features) to the stdlib is that now your code will depend on OCaml compiler >= very new version.
Everyone who is still on the older version of the compiler either: cannot use your library, or needs to use a stdlib shim that backports the newly added module to old compilers.
It is a lot easier to just depend on an opam library that implements the doubly linked list directly then or an opam library that conditionally chooses which implementation to use, it’ll be years before it can use the version in the stdlib directly.

With an opam library anyone can just start using it immediately without the added complexity of worrying about compiler version.

Having said that an interface might be useful to have in the stdlib, so that various implementations do not diverge. And then you don’t necessarily have to use a new compiler version, or a stdlib shim, you can keep using the small doubly-linked list opam library, knowing that at some point it might become just an alias to something the compiler ships.
(where by interface I mean something like Seq, Iter that is more generic and not tied to a single data structure, what would be the equivalent of that for mutable sequential data structures?)

Or upgrade to the new OCaml version :wink:

It would seem that a first step would be for library-writers to standardize on a set of interfaces that their implementations of various core data-structures should provide. So, a collection of signatures, which then the implementors would use on their module implementations.

1 Like

This is a good point about why OCaml’s stdlib is the right place for a Dynarray. When this topic occurred earlier, Adding Dynamic Arrays / Vectors to StdLib, @gasche commented on the preliminary steps to be taken near the end of that thread; the first steps being to enumerate the prior art and compare them. I looked for such a survey and didn’t find any. Is there such a thing? If not, where should it, reside? Perhaps in the underused RFC dir RFCs/rfcs at master · ocaml/RFCs · GitHub?

I don’t only have the problem of uninitialized slot in mind, but also the question about maintaining structure invariant in presence of concurrent and parallel access to values (see, for instance, this issue with Buffer): what kind of warranties should be provided to the user by the library author?

I don’t think a zipper of a list is equivalent to a doubly-linked list even in the case of one iterator because you should be able to push and pop from the ends as well.

If you want a doubly-linked list for speed then you probably want to embed it in an arena of some kind so you aren’t heap allocating individual nodes and writing lots of pointers.

Perhaps off-topic but that is an argument for merging data structures with the run-time: the GC could “know” which pointers to follow so the mutators don’t need to NULL fill everything.