I’m trying to write a deep learning inference module in OCaml as a big project, it will be for learning purposes. Then I’m not sure where to start with the types. I want to use it to create shared libraries from onnx models. I would work on optimizing the graph later, right now I need a basic example working with the module Bigarray. I would need to define some types like `tensor` or `array` to create a type tensor on top of big array, and to extend it with functions as I go. I’ve some experience writing a tensor library but in C++ and it was very straightforward to work with pointers to the data and create a class with additional information like shape, number_of_dims, etc.
How do I create a record type with the following fields:
type tensor = {
mutable dtype : datatype;
mutable data : Bigarray.Genarray type;
mutable shape : Bigarray.Array1 maybe
mutable strides : Bigarray.Array1 maybe
mutable ndims : int
}
Then I could define functions on top of it.
I tried to get the type of a Bigarray and its the following:
let x = Bigarray.Genarray.create Bigarray.float32 Bigarray.fortran_layout [|2;3;4|];;
x;;
- : (float, Bigarray.float32_elt, Bigarray.fortran_layout)
Bigarray.Genarray.t
= <abstr>
I don’t know how to use it inside the record.
An alternative would be to write it in C, so that I can have full control of the data type and have it work possibly on GPU.If anyone could help I would really appreciate it. Is there another library I could use for storing the data instead of Bigarray? It seems like the ideal choice. I’m not concerned with having the fastest implementation possible so anything working in OCaml would be fine.
(*Module Tensor*)
module Tensor = struct
type ('elt, 'kind, 'layout) t = {
mutable data : ('elt, 'kind, 'layout) Bigarray.Genarray.t;
}
(*I'm not sure how to do this to create a new array*)
let create (elt : 'elt) (kind : 'kind) (layout : 'layout) (dims : int array) : ('elt, 'kind, 'layout) t =
let created_data = Bigarray.Genarray.create elt layout dims in
{data = created_data}
end
I get the following errror:
File "lib/nnopt.ml", line 15, characters 14-26:
15 | {data = created_data}
^^^^^^^^^^^^
Error: The value created_data has type ('a, 'b, 'c) Bigarray.Genarray.t
but an expression was expected of type
(('a, 'b) Bigarray.kind, 'kind, 'c Bigarray.layout)
Bigarray.Genarray.t
The type variable 'a occurs inside ('a, 'b) Bigarray.kind
val create : ('a, 'b) Bigarray.kind ->
'c Bigarray.layout -> int array -> ('a, 'b, 'c) t
That means Bigarray.Genarray.create takes 3 arguments of the following types:
('a, 'b) Bigarray.kind (representing the interfacing and internal element types)
'c Bigarray.layout (representing the layout)
int array (the dimensions)
Here 'a is the OCaml type for accessing the elements, 'b is the internal “element type” (referred to as “kind” by the documentation), and 'c is the layout (C vs Fortran).
So when I combine your elt and kind arguments, the following code compiles for me:
(*I'm not sure how to do this to create a new array*)
- let create (elt : 'elt) (kind : 'kind) (layout : 'layout) (dims : int array) : ('elt, 'kind, 'layout) t =
+ let create (elt_kind : ('elt, 'kind) Bigarray.kind) (layout : 'layout Bigarray.layout) (dims : int array) : ('elt, 'kind, 'layout) t =
- let created_data = Bigarray.Genarray.create elt layout dims in
+ let created_data = Bigarray.Genarray.create elt_kind layout dims in
{data = created_data}
(*Module Tensor*)
module Tensor = struct
type ('elt, 'kind, 'layout) t = {
mutable data : ('elt, 'kind, 'layout) Bigarray.Genarray.t;
}
(*I'm not sure how to do this to create a new array*)
let create (elt_kind : ('elt, 'kind) Bigarray.kind) (layout : 'layout Bigarray.layout) (dims : int array) : ('elt, 'kind, 'layout) t =
let created_data = Bigarray.Genarray.create elt_kind layout dims in
{data = created_data}
end
This does what I want but I ended up writing the following code:
(*Module Tensor*)
module Tensor = struct
type ('elt, 'kind, 'layout) t = {
mutable length : int;
mutable shape : int array;
mutable strides : int array;
mutable data : ('elt, 'kind, 'layout) Bigarray.Array1.t;
}
let create (elt : 'elt) (layout : 'layout) (dims : int array) =
let length = Shape.length dims in
let c_layout = Layout.C_layout in
let strides = Shape.compute_strides c_layout dims in
let created_array = Bigarray.Array1.create elt layout length in
{length = length; shape = dims; strides = strides; data = created_array}
(*Get the element at dims...*)
let get (x : ('elt, 'kind, 'layout) t) (dims : int array) : 'elt =
let index = Shape.linear_index dims x.strides in
Bigarray.Array1.get x.data index
(*Get the elemnt at linear index*)
let at (x : ('elt, 'kind, 'layout) t) (index : int) : 'elt =
Bigarray.Array1.get x.data index
end
Because I needed the `at` function to access the element at the linear index, without it the Genarray is pretty much difficult to use.
It looks like you are essentially duplicating the Genarray module. Is there a reason why?
Typically, your create function is
let create elt_kind layout dims =
Bigarray.Genarray.create elt_kind layout dims
let c = create Float64 C_layout [|2;2;2;2|]
let one =
c.{0;0;0,0} <- 1.;
c.{0,0,0,0}
Also note that you can define your own indexing operator with
let (.!()) x indices =
let index = Shape.linear_index indices x.strides in
Bigarray.Array1.get x.data index
let example x = x.!([|0|])
Formally speaking, it can be meaningful to define a tensor as something more than a multidimensional array. In certain applications, particularly in geometry, it is necessary to specify additional structure, such as the vector space and the basis with respect to which the tensor components are expressed. However, in the context of deep learning and neural networks, this level of abstraction is usually not required, and tensors are commonly identified with multidimensional arrays.
I did not know about the “Extended indexing operators”, thanks for the info!
This is mostly a matter of refining the API to define which tensor operations are well-typed, and it can be reflected at the type-level without affecting the array storage layer. Thus reusing Genarray.t as the concrete implementation of the underlying tensor data can be a good first step.
My issue with Genarray is that the it doesn’t have a get function that take linear index and return the value:
For example a representation in memoy of a C layout array is the following:
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
I would like to acces the values by the indices [0..5] so 0→1, 1→2, … and 5→ 6.
This will also allow me to write generic code for elemenwise operations as I won’t have to deal with the number of dimensions at all and instead of doing
for i = 0 to ndim1 do
for j = 0 to ndim2 do
for k = 0 to ndim3 do
....
I could use one loop for defining custom functions.
You can write an iter function either by converting the Genarray.t array to an Array1.t (which is a zero copy operation) or iterate over the shape array:
open Bigarray
let next dims shape =
let rec incr dims shape pos =
if pos >= Array.length shape then false
else if 1 + shape.(pos) >= dims.(pos) then begin
shape.(pos) <- 0;
incr dims shape (pos+1)
end else begin
shape.(pos) <- shape.(pos) + 1;
true
end
in incr dims shape 0
let iter f array =
let dims = Genarray.dims array in
let shape = Array.map (Fun.const 0) dims in
f (Genarray.get array shape);
while next dims shape do f (Genarray.get array shape) done