[ANN] Release of piece_rope 0.9.0

hummy123 · March 21, 2023, 2:41am

Hi everyone.

I’m not sure how useful this is to others, but I published my first open source package to opam which can be used by anyone interested in it.

It is the same data structure used for editing text in VS Code [1] and AbiWord, with a few modifications to make it persistent and functional.

It provides some nice features on top which include:

Serialising and deserialising the structure to a file for persistent undo/redo.
Translating between offsets of different Unicode encodings (UTF-8, UTF-16 and UTF-32) for interaction with external systems and languages.
There is a computationally expensive function to rebuild the structure, optimising it for memory usage and performance. (The idea was to use this in a GUI app when the user is inactive and rebuilding in that case - the core structure is fast but the perfectionist in me wanted it to be as fast and lean in memory as possible.)

There are some similar packages others may find useful like Zed which are a more battle-tested implementation of a similar-but-not-quite-the-same data structure as this is a new implementation.

Feel free to comment or open issues for any bugs you find during usage. Hopefully someone else out there will find this useful too.

[1] Text Buffer Reimplementation, a Visual Studio Code Story
[2] Improving the Piece Table

Edit: There was some discussion requesting a benchmark with comparison with other implementations below. So I’ve compared this implementation with Zed (used in Utop) and the Rope in Ocaml-bazaar mentioned below.

This library seems to be roughly competitive with the one in Ocaml Bazaar. It is worth noting that these datasets they’re being compared contain edit traces that likely took hours to write (the Automerge dataset is an edit trace of writing an academic paper).

	Svelte	Rustcode	Sephblog	Automerge
piece_rope	16 ms	31 ms	94 ms	218 ms
zed_rope	267 ms	1287 ms	665 ms	1558 ms
ocaml_bazaar	13 ms	86 ms	198 ms	122 ms

zapashcanon · March 21, 2023, 3:01pm

Hi !

It seems nice ! Here’s a link to the GitHub repository as I believe some other people are going to search for it.

How does it compare to the rope implementation by Jean-Christophe Filliâtre ? How does the balancing algorithm you’re using compares to A Functional Implementation of the Garsia–Wachs Algorithm ?

hummy123 · March 21, 2023, 4:42pm

Hi there.

Thanks for reviewing and for the links.

I can’t answer your questions unfortunately because I found the existing text buffer implementations in OCaml a bit unintuitive to use.

For example, the function signature val insert : t -> int -> t -> t in the rope implementation you linked is unusual to me because I expect to insert a plain string and get back a data structure with the string inserted. This is likely my own fault due to being new to OCaml (although I have some experience with F#).

I also haven’t heard of the Garsia-Wachs algorithm but I appreciate your link as it is more for me to learn from. (Excited to read it.)

I do have benchmarks (which come from running real world editing trace datasets brought together by Joseph Gentle) that can be compared with other structures.

First, here are my results on the different datasets using an AMD Ryzen 3 5000:

Svelte data set: 19.198000 ms
Rustcode dataset: 33.233000 ms
Sephblog dataset: 105.407000 ms
Automerge dataset: 233.882000 ms

You can compare these results with the results from Rust’s fastest mutable Rope implementations (of course mine isn’t mutable) here. I’ll try to edit my benchmark code in a bit to compare it with other OCaml libraries and post a reply here with results soon.

Thanks for the questions. I’m curious to see how my implementation compares with others as well.

hummy123 · March 21, 2023, 6:44pm

@zapashcanon I was a little confused how to use the rope implementation you linked because module types (not plain modules) and functors are still a new concept to me and we don’t have them in F# which is what I’m more familiar with.

However, I did make the benchmark code generic so any implementation can easily be run on any of the datasets. Some kind soul who is more familiar with OCaml than me can hopefully help add benchmarks.

The file where the generic code is run is here. The code is commented on how to use the generic function to run the datasets with any implementation. Someone just needs to copy and paste at the bottom of that file and fill in the blank parts I left.

Edit: Zed (used by Utop) seems to be significantly slower on the mentioned datasets. This is news to me as well.

(Reproducible with dune exec bench/bench.exe --profile=release after cloning the repository here.)

	Svelte	Rustcode	Sephblog	Automerge
piece_rope	16 ms	31 ms	94 ms	218 ms
zed_rope	267 ms	1287 ms	665 ms	1558 ms

threepwood · March 21, 2023, 7:59pm

Maybe worth saying that @zapashcanon linked the .ml, but you should read the .mli first, because this is where the documentation is.

The module types are used to write the signatures of actual modules that are defined further away. You probably want the module S, which defines an of_string function, and whose signature includes ROPE so that you can call S.insert etc.

hummy123 · March 21, 2023, 8:59pm

Thanks for the hint @threepwood; I wouldn’t have figured that out myself.

@zapashcanon I’ve updated the top post with the time the three libraries (this, Zed, the Rope in OCaml-Bazaar) took to process edit trace datasets. Zed is the slowest while my library and the one in OCaml-Bazaar seem to have competitive performance.

I think the library in OCaml-Bazaar can become even faster than it already is if it reimplements its delete function.

So far, the delete function only deletes one character at a time and this is the code I had to use to delete multiple characters, which is less efficient than it could be. I’ll open an issue about that in case the author wants to improve his implementation.

I was curious about the performance of this piece_rope compared to other libraries as well but likely wouldn’t have bothered to check if others didn’t mention it. So thanks to you both.

zapashcanon · March 22, 2023, 11:24am

Interesting, thanks for taking the time to run these benchmarks. Thanks also for opening the issue, I believe the author will be interested.

Which version of the compiler did you use ?

There’s two other rope libraries in opam, rope and merge-ropes, in case you want to benchmark these too.

hummy123 · March 23, 2023, 7:06am

Thanks for the links. I think I’ll pass on adding those to the benchmarks list. The implementation in the first link can’t run these datasets as it’s only designed to append or concatenate (no function to insert at any random location or delete) so that’s not really possible.

Merge-ropes actually seems like a cool idea to me but I think it has different goals that will likely give it more overhead (like using Lwt for async operations and integration with Irmin which is a distributed database related to MirageOS).

The above benchmarks were run with Ocaml.5.0.0 for what it’s worth.

Topic		Replies	Views
[ANN] reed-solomon-erasure 1.0.1 Ecosystem announce	13	1795	August 1, 2018
[ANN] Release of rope 0.6.1 Community announce	0	1196	December 26, 2017
What's the fastest way to concatenate strings? Learning	14	10378	October 30, 2018
Do we have a pure OCaml, high-performance, persistent key-value store in 2023? Learning ocaml , persistent , key-value-store , high-performance	26	3484	June 24, 2025
Standardized mutable data structures Ecosystem	34	2479	October 29, 2022

[ANN] Release of piece_rope 0.9.0

Related topics