I have a problem where I need to create a Bigarray of specific element type using information read a json file. I must read a string from a json field wich can be one of {"float32", "float64", "int8", "int16", "int32", "int64", "complex32", "complex64"}. Based on the string value I must construct an empty Bigarray whose elements have the datatype implied by the string value. For example, “float32” means a big array of Bigarray.Float32 kind type must be created.
I tried writing a function to do the conversion but it won’t pass the type checker:
It would be nice to show us the type signature you are trying to implement. But it’s likely that you will have to wrap your bigarrays in an existential to hide the polymorphic types. Something like:
type bigarray = Bigarray : ('a, 'b, B.c_layout) B.Genarray.t -> bigarray
let of_datatypes dim = function
| "float64" -> Bigarray (B.Genarray.create B.float64 B.c_layout dims)
…
Before reaching for GADTs, it might be better to check if the of_datatypes function is needed. In particular, if once parsed the various cases are dispatched to different code paths, a simpler solution might be to dispatch directly the different cases with:
match input_kind with
| "float64" -> float_case (B.Genarray.create B.float64 B.c_layout dims)
| "float32" -> float32_case ...
...
(which is equivalent of having a record of continuations in of_datatypes for each case, which is equivalent to hiding the bigarray types behind an existential quantification, but I think it is better to avoid climbing up the type complexity ladder by accident).
It may help to simplify this problem by not thinking about Bigarrays for a moment. In essence, the problem is the same as this:
let of_datatype = function
| "string" -> "a string"
| "float" -> 3.14
| "int" -> 100
which will not pass the typechecker for the same reason. In OCaml, an expression must be one type or another. If you need to represent different possible types then you’ll have to wrap them in a variant type, apply different continuations to them, or something else like that (as illustrated above).
I think the real question is, what do you need to do with your function? If you call it with let arr = of_datatype dims str in, then what do you expect arr to be? What type would arr.{0} return? Although there are techniques that can make of_datatype type-check, it’s possible that there’s a deeper issue with the rest of your code that makes it appear to need an ill-typed function.
I tried this but it doesn’t work when I try to recover the type of the bigarray. What i’m trying to do is write a library for large chunked and compressed N-dimensional arrays; basically implementing the spec outlined here: Zarr core specification (version 3.0) — Zarr specs documentation . Reading an array chunk from its underlying store involves using the array metadata to decompress the array bytes of a chunk and decode it in a series of steps, update its values and then compress it again and write it to its underlying store.
What I’d like to achieve is to keep metadata about the Bigarray.kind, fill value and shape of the underlying chunk so that I can use that information to properly decode the array bytes. However this approach hides the Bigarray kind and when I try to recover it using pattern matching , the information is lost and I get errors about the type escaping it’s scope.
Is there a way I can reliably store the metadata of the Bigarray type so I can use it later when encoding/decoding the array bytes stored to disk (or wherever)?
The information is not lost but you need to be careful in the way you pack it and you can only unpack an existential and use it in a limited scope. Besides you may have to write some of the function types using the packed information explictely. It’s difficult to help you without showing us a minimal example that exhibits the problem you are facing.
The attempt was to pass in this type: zarr-ml/lib/common.ml at beb82892cd13ce79df2a155b556f4d412a9c0bf3 · zoj613/zarr-ml · GitHub when decoding the array bytes so that the decoding logic can have access to the shape, Bigarray.kind and fill_value of the array. But all the techniques I tried lead to type check errors, and packing the existential types seems to not help at all.
Do you have any suggestion on how I can simplify things? It seems like working with Bigarrays is a pain but maybe it’s a skillset issue??!
You are going against the grain of the library by trying to use the bigarray type as a a single type rather than a family of types, which requires a lot of GADTs.
Since it doesn’t sound like you ever use the ability to distinguish between the array types, how about using a type that merge together all bigarray types that you use:
open Bigarray
type barray =
| Char of (char, int8_unsigned_elt, c_layout) Genarray.t
| Int of (int,int_elt,c_layout) Genarray.t
| Float of (float, float_elt, c_layout) Genarray.t
Then you can create a metadata type that hold the information on how to fills the arrays
module K = struct
type fillable =
| Char of char
| Int of int
| Float of float
end
and use it to create a barray of the correct kind without GADTs
let create_and_fill fkind shape =
let fcreate k filler =
let b = Genarray.create k c_layout shape in
Genarray.fill b filler;
b
in
match fkind with
| K.Char c -> Char (fcreate Char c)
| K.Int i -> Int (fcreate Int i)
| K.Float f -> Float (fcreate Float64 f)
with no GADTs in sight.
Using GADTs is a way to have more piecewise control on the weaving of type information and data layout, but at the cost on an increase of complexity.
If you wish to go on the GADT path, it would be easier to help with specific examples.
Without such examples, a few rules of thumb when working with GADTs are:
write the type of your functions first
existential quantification allow to make type information local. Local information cannot be made global, but it can be compared to global information.
This seems to get rid of the scope errors I was getting…but now I have a slighly different problem with the chained Result monads. I get a type check error when trying to compile the code:
File "lib/store.ml", lines 98-105, characters 8-60:
98 | ........let* b = get chunkkey t in
99 | let* arr = if String.(equal b empty) then
100 | Ok (Ndarray.create kind shape fill_value)
101 | else
102 | Chain.decode chain repr b
103 | in
104 | List.iter (fun (coord, y) -> Ndarray.set arr coord y) vals;
105 | Result.map (set t chunkkey) (Chain.encode chain arr).....
Error: This expression has type
(unit,
[> `Bytes_decode_error of key
| `Bytes_encode_error of key
| `Crc32c_decode_error of key
| `Crc32c_encode_error of key
| `Gzip of Ezgzip.error
| `Invalid_byte_range of key
| `Key_not_found of key
| `Transpose_decode_error of key * int * int
| `Transpose_encode_error of key * int * int ])
result
but an expression was expected of type unit
My guess is that because the iter function returns unit and not a value. I am not sure how to compose this imperative piece of code with the rest of the monad chain in the set_array function. Is there a pattern I need to adopt that im not aware of yet?
If the iter function cannot fail, you can just fix the type of perform. If a step of an iteration can fail, then iter is not the right function. You could use fold to implement a variant of
let iter_until f a =
Array.fold_left (fun acc x -> let* () = acc in f x) (Ok ()) a
This got rid of the error…but now I have a new one regarding scope:
File "lib/store.ml", line 124, characters 62-65:
124 | Result.map (set t chunkkey) (Chain.encode chain arr)) tbl (Ok ()) in
^^^
Error: This expression has type (int array * a/2) list Arraytbl.t
but an expression was expected of type (int array * a) list Arraytbl.t
The type constructor a would escape its scope
File "lib/store.ml", lines 112-124, characters 8-73:
Definition of type a
File "lib/store.ml", lines 92-132, characters 6-58:
Definition of type a/2
I tried passing tbl in as an argument and also explicitly annotating its type but this made the expression Ndarray.to_array x in line 111 produce the same scoping error. Not sure how to proceed
The error message is telling you that the perform function is not polymorphic since it using a table with a type a coming from the binding in the set_array function. In particular, there is no reasons to assume that the type a from the kind argument of perform has anything to do with the type a of the Ndarray.
I would recommend to avoid nesting functions with universal quantifications, and if you do so, it would be clearer to use different type names:
let set_array: type a b.
Path.t -> Owl_types.slice -> (a, b) Ndarray.t -> t
-> (unit, [> set_error]) result
= ...
let perform
: type elt tag. (elt, tag) Bigarray.kind -> elt
-> (unit, [> set_error]) result
= ...
However, in this case the problem starts with the type:
let set_array: type a b.
Path.t -> Owl_types.slice -> (a, b) Ndarray.t -> t
-> (unit, [> set_error]) result
With this type, you are promising that the function set_array works with any type a and b with no information on those types. This can only work if the set_array avoid any kind of operations that requires to know those types which is not the case here.
To avoid this conundrum, you need to add some runtime information about the Ndarray that you are manipulating. For instance, you could add a kind argument to convey this information.
But once again, it is not clear what you gain in your use case by piling up GADTs, compared to replacing the ndarray type by the simpler barray variant that I showed you previously.
The set_array function is userfacing and the caller passes in a bigarray with values such that those values get written to the chunked array in the underlying storage. The Owl_slicing.slice types gives information about where in the chunked array to write the values. Wouldn’t hiding the array information by requiring the caller to wrap the Ndarray in a darray type make the API a bit hard to use?
In any case, I updated the implementation to make use of the information provided by the input array x and got rid of the scoping errors. Basically I extracted the kind and fill_value from it. The fill_value isn’t exactly used so its just a “stub” used to instantiate a array_repr record. Here is the updated implementation : zarr-ml/lib/store.ml at 72d22d0bd37d2014d48df54f59b4d24f4a2217fc · zoj613/zarr-ml · GitHub
Ah, I did forget that Bigarray provided a kind function. However, beware that your code is wrong for zero sized array and that you should not use to_array which performs a full copy of the array to extract the first element.
Also since Bigarray provides a kind function, you can write a function ('a,'b) ndarray -> any_ndarray functions and only use the any_ndarray type internally.
I would advise to not prioritize optimizing your API for users that you don’t have before having a working library . It will be easier to tune the exposed API once your library is working, and you can identify potential pain points.
To me, this looks like you should simply provide a function that translates the string to a type.
So “float64” → Float64, “int8” → Int8 etc.
Then the user just has to call exactly that function that returns a bigarray of exactly the wanted bigarray. Say bigarray_f64, bigarray_i8 etc.
The user of your lib knows what he wants. Just provide him the services to simply get that. If the user is undecided what he wants, it is his turn to make up his mind.
Thank you for this piece of advice, it helped me make good progress since I last posted here. I now got the set_array and get_array functions working well. I just need to start thinking about how to include concurrency for greater performance since array “chunks” are independent from each other and can be processed in parallel.