Hi everyone, I’d like to announce the first release of zarr
, an Ocaml implementation of the Zarr version 3 storage format specification for chunked & compressed multi-dimensional arrays, designed for use in parallel computing.
why?
The project was mainly inspired by the lack of functional programming language implementations of this specification as shown in this implementations table. Since I have been learning OCaml these past few months I figured I’d take on the challenge of producing the first functional programming implementation of Zarr, and it was a great learning experience!
Features
- Supports creating n-dimensional Zarr arrays and chunking them along any dimension.
- Compress array chunks using a variety of supported compression codecs.
- Supports indexing operations to read/write views of a Zarr array.
- Supports storing arrays in-memory or the local filesystem. It is also
extensible, allowing users to easily create and use their own custom storage backends. See the example implementing a Zip file store for more details. - Supports both synchronous and concurrent I/O via
Lwt
andEio
. - Leverages the strong type system of Ocaml to create a type-safe API; making it impossible to create, read or write malformed arrays.
- Supports organizing arrays into hierarchies using Groups.
Example
Below is a demo of the library’s API for creating, reading and writing to a Zarr hierarchy.
open Zarr
open Zarr.Metadata
open Zarr.Node
open Zarr.Codecs
open Zarr.Indexing
open Zarr_sync.Storage
(* opens infix operators >>= and >>| for monadic bind & map *)
open FilesytemStore.Deferred.Infix
let store = FilesystemStore.create "testdata.zarr" in
let group_node = GroupNode.of_path "/some/group" in
FilesystemStore.create_group store group_node;
let array_node = ArrayNode.(group_node / "name");;
(* creates an array with char data type and fill value '?' *)
FilesystemStore.create_array
~codecs:[`Transpose [|2; 0; 1|]; `Bytes BE; `Gzip L2]
~shape:[|100; 100; 50|]
~chunks:[|10; 15; 20|]
Ndarray.Char
'?'
array_node
store;
let slice = [|R [|0; 20|]; I 10; R [||]|] in
let x = FilesystemStore.read_array store array_node slice Ndarray.Char in
(* Do some computation on the array slice *)
let x' = Zarr.Ndarray.map (fun _ -> Random.int 256 |> Char.chr) x in
FilesystemStore.write_array store array_node slice x';
let y = FilesystemStore.read_array store array_node slice Ndarray.Char in
assert (Ndarray.equal x' y);
Installation
The library comes in several flavors depending on the synchronous / asynchronous backend of choice. To install the synchronous API, use
$ opam install zarr-sync
To install zarr with an asynchronous API powered by Lwt
or Eio
, use
$ opam install zarr-lwt
$ opam install zarr-eio
The documentation can be found here and the source code there
I’m happy to answer any questions regarding the library and more than welcome suggestions for improvements (especially performance!), issue reports as well as PR’s.