[ANN] vec 0.2.0

I’ve just released version 0.2.0 of vec, a library for safe dynamic arrays with Rust-like mutability permissions.

You can find the package on opam here, and the source repository here.

This release adds new APIs for filtering and comparing vectors, as well as some bug fixes.

Breaking changes from 0.1.0:

  • Some functions were renamed to conform to Stdlib's conventions: anyexists, allfor_all
  • Potentially-unsafe APIs for directly creating vectors with a buffer and accessing vectors’ buffers were removed

Looking for feedback and suggestions!

7 Likes

A minor remark: I find it remarkable how closely the proposed API mirrors the one of the BatArray.Cap interface, an Array submodule doing essentially the same thing contributed by David Teller in 2008. (Many details are different as vec offers dynamically-resizable arrays, while Array.Cap is fixed-size arrays, but this is orthogonal to the static control over mutability.)

To me this suggests that the vec API is not actually specific to Rust, or at least that the inspiration arrived at the same point as the long tradition of “phantom types” in ML-family languages. (In this space I think the key idea popularized by Rust would be ownership (possibly with borrowing), and in particular the idea that by default mutable values should be uniquely-owned, while immutable values can easily be shared.)

This is not a criticism of the library itslef! I very much like the idea of having small modules that cover simple needs, rather than large monolithic libraries.

Question: in Batteries, my impression is that Array.Cap was never used much. I would guess that the reason was that, for most users, the static guarantees of the interface did not offset the (mild) cost of the more complex types to manage. What is/are your use-case(s) where reasoning about mutation is important?

1 Like

I didn’t know about that module. They are indeed very similar.

Regarding your second point, yes, this isn’t really specific to Rust, it just popularized the idea.
My initial inspiration was this presentation by Yaron Minsky, where he does a similar thing, but for a ref-like type. My initial reaction was “Hey, that looks a lot like Rust’s references”.

Honestly, I started this project more as a fun exercise rather than to meet a real-world use-case, but I assume there are situations when the mutability control comes in handy e.g. If you want to pass a buffer to some function to fill but don’t want it to read its current contents, you could pass an ('a, [`W]) Vec.t instead of allocating a new buffer.

1 Like

Interestingly it also looks very similar to containers’ CCVector, which is a resizable array with read and write permissions using phantom types. (see CCVector (containers.CCVector))

And to answer gasche’s question, personally I like having a vector that is immutable, after building it using mutable means. It’s like a list but it can be right appended to easily.

3 Likes

I like how CCVector defines types ro and rw which results in more readable function signatures. The concept of read-only and read-write arrays is simple enough but at least for me types likes

val as_write_only: ('a, [> `W]) t -> ('a, [`W]) t

look busy, especially when 'a is a more complicated type. So I would suggest to steal that idea.

1 Like

I had a quick glance at the documentation and I am confused by some of the types. Why do clear and set_growth_rate require the vector to be readable? For example, the growth rate seems meaningful only for operations like push, yet the latter only requires writability. Also, growth_rate requires the vector to be readable but capacity does not.

Also, is there a use case for write-only vectors? It seems everything could have been done with just two input types: [>]and [>`W].

You’re right, some of the functions are inconsistent. In some places I’ve considered that permissions “are more than the sum of their parts” i.e. if you only have a [`R] or [`W] vec, then you’ve been handed a vec owned by someone else to read from/fill out, but if you have a [`R | `W] vec then you “own” it, and thus can do more operations on it.

But looking over the docs again I see I’ve applied this inconsistently. Also operations like set_growth_rate and shrink_to_fit are a bit more complicated: If you have a [`W] vector, does that mean you can only mutate/add/remove elements from it, or does it mean you can directly mutate internal properties like capacity and growth rate? (That question is ultimately for me to answer, but I’m writing it to see others’ thoughts)

Regarding use cases for write-only vectors, I’ve mentioned one earlier, though I imagine it’s pretty niche indeed.

You might find the documentation of the Perms library in Core_kernel to be interesting:

https://ocaml.janestreet.com/ocaml-core/v0.12/doc/core_kernel/Core_kernel/Perms/index.html

This establishes idioms that are used across a variety of permissioned types in our codebase. Notably, it distinguishes between a read-only value (which doesn’t directly support mutation) and immutable values (which no has a write-handle to), which we’ve found to be a useful distinction. It also highlights some usage patterns that help avoid some common mistakes in using phantom types correctly.

6 Likes

And, in the same spirit of others posts, I would like to share a pull-request on ocaml-cstruct which is a nice discussion about capabilities and how to implement them into an already existing codebase.

However, as far as I can tell, we don’t really use it widely - and we should. The main problem is the cost to upgrade an old code with cstruct with this interface where we put some new constraints (which can reveal some “bugs” in any way).

4 Likes

I recently played around with my own capabilities interface for references, and came to some preferences of my own that might be of interest:

  • I like to have the initial reference be non-readable and non-writeable - this adds the feature of my code being explicit in all places where I pass on the capability of either a readable or writeable reference
  • In contrast with your interface, I made my read-function depend on a closed read-only type. This way the capabilities restrictions doesn’t just get communicated from caller to callee, but also from callee to caller. E.g. if a function only ever reads a reference, then it demands that its argument doesn’t support writing.

My reference interface:

type ('a, 'b) t

val make : 'a -> ('a, [ `init ]) t

(*> For caller restricting callees*)
val to_reader : ('a, [< `init | `write ]) t -> ('a, [ `read ]) t

val to_writer : ('a, [ `init ]) t -> ('a, [ `write ]) t

(*> For callees restricting caller*)
val read : ('a, [ `read ]) t -> 'a

(*> Don't use if there is no need to write*)
val read_any : ('a, [< `write | `read ]) t -> 'a

val write : ('a, [ `write ]) t -> 'a -> unit

Edit: The reason for my first point is also that I find it dangerous that the default is to be able to write to a reference, as my experience is the references often only needs to be written to from a single place. Mixed with subtyping of the phantom type in the read-functions signature, I feel this can lead to careless use of the interface. The whole point of this is to be much more restrictive - so why go half-way?