I’m not sure how useful this is to others, but I published my first open source package to opam which can be used by anyone interested in it.
It is the same data structure used for editing text in VS Code  and AbiWord, with a few modifications to make it persistent and functional.
It provides some nice features on top which include:
- Serialising and deserialising the structure to a file for persistent undo/redo.
- Translating between offsets of different Unicode encodings (UTF-8, UTF-16 and UTF-32) for interaction with external systems and languages.
- There is a computationally expensive function to rebuild the structure, optimising it for memory usage and performance. (The idea was to use this in a GUI app when the user is inactive and rebuilding in that case - the core structure is fast but the perfectionist in me wanted it to be as fast and lean in memory as possible.)
There are some similar packages others may find useful like Zed which are a more battle-tested implementation of a similar-but-not-quite-the-same data structure as this is a new implementation.
Feel free to comment or open issues for any bugs you find during usage. Hopefully someone else out there will find this useful too.
 Text Buffer Reimplementation, a Visual Studio Code Story
 Improving the Piece Table
Edit: There was some discussion requesting a benchmark with comparison with other implementations below. So I’ve compared this implementation with Zed (used in Utop) and the Rope in Ocaml-bazaar mentioned below.
This library seems to be roughly competitive with the one in Ocaml Bazaar. It is worth noting that these datasets they’re being compared contain edit traces that likely took hours to write (the Automerge dataset is an edit trace of writing an academic paper).
It seems nice ! Here’s a link to the GitHub repository as I believe some other people are going to search for it.
How does it compare to the rope implementation by Jean-Christophe Filliâtre ? How does the balancing algorithm you’re using compares to A Functional Implementation of the Garsia–Wachs Algorithm ?
Thanks for reviewing and for the links.
I can’t answer your questions unfortunately because I found the existing text buffer implementations in OCaml a bit unintuitive to use.
For example, the function signature
val insert : t -> int -> t -> t in the rope implementation you linked is unusual to me because I expect to insert a plain string and get back a data structure with the string inserted. This is likely my own fault due to being new to OCaml (although I have some experience with F#).
I also haven’t heard of the Garsia-Wachs algorithm but I appreciate your link as it is more for me to learn from. (Excited to read it.)
I do have benchmarks (which come from running real world editing trace datasets brought together by Joseph Gentle) that can be compared with other structures.
First, here are my results on the different datasets using an AMD Ryzen 3 5000:
- Svelte data set: 19.198000 ms
- Rustcode dataset: 33.233000 ms
- Sephblog dataset: 105.407000 ms
- Automerge dataset: 233.882000 ms
You can compare these results with the results from Rust’s fastest mutable Rope implementations (of course mine isn’t mutable) here. I’ll try to edit my benchmark code in a bit to compare it with other OCaml libraries and post a reply here with results soon.
Thanks for the questions. I’m curious to see how my implementation compares with others as well.
@zapashcanon I was a little confused how to use the rope implementation you linked because module types (not plain modules) and functors are still a new concept to me and we don’t have them in F# which is what I’m more familiar with.
However, I did make the benchmark code generic so any implementation can easily be run on any of the datasets. Some kind soul who is more familiar with OCaml than me can hopefully help add benchmarks.
The file where the generic code is run is here. The code is commented on how to use the generic function to run the datasets with any implementation. Someone just needs to copy and paste at the bottom of that file and fill in the blank parts I left.
Edit: Zed (used by Utop) seems to be significantly slower on the mentioned datasets. This is news to me as well.
dune exec bench/bench.exe --profile=release after cloning the repository here.)
Maybe worth saying that @zapashcanon linked the .ml, but you should read the .mli first, because this is where the documentation is.
The module types are used to write the signatures of actual modules that are defined further away. You probably want the module
S, which defines an
of_string function, and whose signature includes
ROPE so that you can call
Thanks for the hint @threepwood; I wouldn’t have figured that out myself.
@zapashcanon I’ve updated the top post with the time the three libraries (this, Zed, the Rope in OCaml-Bazaar) took to process edit trace datasets. Zed is the slowest while my library and the one in OCaml-Bazaar seem to have competitive performance.
I think the library in OCaml-Bazaar can become even faster than it already is if it reimplements its delete function.
So far, the delete function only deletes one character at a time and this is the code I had to use to delete multiple characters, which is less efficient than it could be. I’ll open an issue about that in case the author wants to improve his implementation.
I was curious about the performance of this piece_rope compared to other libraries as well but likely wouldn’t have bothered to check if others didn’t mention it. So thanks to you both.
Interesting, thanks for taking the time to run these benchmarks. Thanks also for opening the issue, I believe the author will be interested.
Which version of the compiler did you use ?
There’s two other rope libraries in opam, rope and merge-ropes, in case you want to benchmark these too.
Thanks for the links. I think I’ll pass on adding those to the benchmarks list. The implementation in the first link can’t run these datasets as it’s only designed to append or concatenate (no function to insert at any random location or delete) so that’s not really possible.
Merge-ropes actually seems like a cool idea to me but I think it has different goals that will likely give it more overhead (like using Lwt for async operations and integration with Irmin which is a distributed database related to MirageOS).
The above benchmarks were run with Ocaml.5.0.0 for what it’s worth.