When to use iarray instead of array?

Stumbling upon this other thread, I realized that OCaml has support for immutable arrays:

And indeed, those are part of OCaml 5.4: module Iarray and also a type iarray (for which I don’t seem to find an entry in the manual here; maybe it is missing in the manual?).

It doesn’t have an indexing operator (#13784), but you can define one.

I struggle a bit when and how to use that new type. In my case, I have written a solver for linear equation systems using the Gauss method. I would like the types to be something like:

val solve : (Num.t iarray * Num.t) list -> Num.t array option

with some generic number module Num.

I don’t want the function to modify/destroy the input arguments. So I assume changing my function to accept an iarray is fine? But while I’m calculating the output, as a last step of the Gauss algorithm, I keep the numbers in a mutable array anyway. So is it reasonable to return that mutable array instead of converting it into an immutable array? (Afterall, some future users of my function might want to modify the result.)

Or should I stick to one kind of array and use it consistently for input and output?

What are other people’s experiences with using iarray? I noticed that refactoring my code to use iarray instead of array is quite a lot of work, especially as I will have to do that replacement in some places, while in others I need to keep mutability. So I would need to duplicate some code, e.g. here in my tests (using Alcotest):

 (** Tests rational numbers ([Q.t]). *)
 let testable_q : Q.t Alcotest.testable = Alcotest.testable Q.pp_print Q.equal
 
 (** Tests rational vectors (i.e. arrays of rational numbers). *)
 let testable_qvec = Alcotest.array testable_q
+
+(** Tests immutable rational vectors. *)
+let testable_qivec =
+  Alcotest.testable
+  (fun f x -> Format.pp_print_array Q.pp_print f (Iarray.to_array x))
+  (Iarray.for_all2 Q.equal)

Also note that there doesn’t seem to be an Alcotest.iarray, so I have to convert my iarray first to an array before Alcotest is able to process it. Either by providing a corresponding testable as above, or by manually converting all iarrays into array before letting Alcotest process them.

Is this friction going to be better within time (possibly leading to more duplication in libraries?), or is iarray more to be seen as a special type that you only use when you really need it, and otherwise stick to array and/or making types abstract when you want to avoid access from outside?

1 Like

The intention is that people should rather use Iarray.t, and iarray is a built-in type that is more an implementation detail that may disappear from the interface. (In fact, we should rewrite this API to expose 'a Iarray.t rather than 'a iarray).

The intention of immutable arrays is mostly to have better APIs: documenting that functions do not modify their arguments for instance.

So it’s up to you and to how you think your API should be used. If it’s convenient for your users to be able to modify the result of a function, maybe you should use a mutable array.

Note that Dynarray.unsafe_to_iarray exists, i.e., turning a mutable (and dynamically-sized) array to an iarray without a copy (under some conditions).

Since Iarray.t is in the stdlib, I think it would make sense to suggest to the alcotest maintainers to add Alcotest.iarray.

1 Like

Oh, that is good to know.

Yeah, the documentation of the Iarray module doesn’t just mention iarray in type 'a t, but iarray is mentioned in every function there, e.g. iter. If iarray is really an implementation detail, it’s somewhat misleading as of yet.

So thank you for letting me know.

Reading in the comments on the corresponding pull-request #13097, I found some thought by @charguer that seems to be contrary to that:

charguer commented on May 14, 2024:

[…]

The notion of immutable arrays is useful as a typing information. Sometimes, you’d like the array to be unmodified by a function, but subsequently recover the ability to modify it. For that, you need more than types, something like separation logic permissions. […]

Specifically, say an algorithm X creates an array that it needs to modify and – during its runtime at several stages – algorithm X calls several functions (f1, f2, …) that do not modify the array. If those functions use Iarray.t to document that they don’t modify their argument, then calling Iarray.to_array will require X to create a copy each time of the array (unless we use some unsafe magic, I think?).

On the other hand, if another algorithm Y works with an array that it does not need to modify an array, and if (f1, f2, …) were designed such that they accept an array instead of an iarray (so that they are more performant when used by algorithm X), then using f1, f2, … in algorithm Y would require to needlessly create a mutable array for each invocation of f1, f2, …

This is because Iarray.t (to my understanding) gives more guarantees than “this function will not modify its argument”. It also reflects a guarantee by the caller that the array won’t be modified, so we have a two-sided guarantee here. Passing an argument a : Iarray.t to a function f guarantees that:

  • Function f will not modify a.
  • No parallel task will modify a.
  • No functions invoked during f’s runtime will modify a.
  • No functions invoked after f returned will modify a.

I cannot just express the first guarantee (function f will not modify a) by replacing an array argument with an Iarray.t argument without additionally requiring the latter three guarantees by the caller as well, which causes (in my example above) X to create a copy each time of the array.

This concurs with some in the remainder of the above cited comment in the PR:

The notion of immutable arrays is useful to enable compiler optimizations […]


It only exists for Dynarray but not for IarrayArray? Is there a reason for that? (edit: I thinkI found the answer below.) (Generally, I would probably not want to use unsafe functions for my use-case, but I’m curious.)

The following isn’t meant as a suggestions how to idiomatically write code, but to understand the underlying nature of “I accept an array that I won’t modify” in an as-abstract-as-possible way (i.e. without demanding additional guarantees made by the caller as pointed out in my previous post). I think the answer is abstract types. I would probably express it as follows:

module type Parray = sig
  type 'a t

  val of_array : 'a array -> 'a t
  val of_iarray : 'a Iarray.t -> 'a t
  val to_array : 'a t -> 'a array
  val to_iarray : 'a t -> 'a Iarray.t
  val get : 'a t -> int -> 'a
end

module Parray1 : Parray = struct
  type 'a t = 'a array

  let of_array = Fun.id
  let of_iarray = Iarray.to_array
  let to_array = Fun.id
  let to_iarray = Iarray.of_array
  let get = Array.get
end

module Parray2 : Parray = struct
  type 'a t = 'a Iarray.t

  let of_array = Iarray.of_array
  let of_iarray = Fun.id
  let to_array = Iarray.to_array
  let to_iarray = Fun.id
  let get = Iarray.get
end

module M (Parr : Parray) = struct
  let algorithm : int Parr.t -> int =
    fun arr ->
      Parr.get arr 0 + Parr.get arr 1
end

module M1 = M (Parray1)
module M2 = M (Parray2)

let _ =
  let a1 : _ array = [| 12; 13; 102 |] in
  assert (M1.algorithm (Parray1.of_array a1) = 25);
  assert (M2.algorithm (Parray2.of_array a1) = 25);
  let a2 : _ Iarray.t = [| 12; 13; 102 |] in
  assert (M1.algorithm (Parray1.of_iarray a2) = 25);
  assert (M2.algorithm (Parray2.of_iarray a2) = 25)

Here, the algorithm (algorithm) is provided as part of a functor, allowing it to work both with array and Iarray.t, depending on what the caller provides.

In particular, if a caller already happens to have a mutable array, we can instantiate the functor such that converting the mutable array into the abstract type is a no-op: Parray1.of_array a1 is simply a1. Accordingly, if a caller happens to have an immutable array, we can instantiate the functor such that converting the immutable array into the abstract type is a no-op: Parray2.of_iarray a2 is simply a2.

Now I don’t want to propose providing a generic interface like Parray above. But what I wonder is: Perhaps I should consider whether I’m using array or Iarray.t as an implementation detail that should not be exposed in my API anyway and instead use abstract types (as also suggested to me in another context).

Getting back to my Gauss solver, perhaps an equation should be an abstract type that can be constructed by lists, mutable arrays, immutable arrays, etc. And my implementation decides when or if a conversion is required. Something like:

module type Number = sig
  type t
  (* ... *)
end

module Equation (Num : Number) : sig
  type t

  val make : Num.t Seq.t -> Num.t -> t
  val make_array : Num.t array -> Num.t -> t
  val make_iarray : Num.t Iarray.t -> Num.t -> t
end = struct
  (* The choice for [t] might have an effect on performance,
     depending on whether users of the API primarily use
     [make_array] or [make_iarray]. *)
  type t = Num.t Iarray.t * Num.t

  let make lhs rhs = (Iarray.of_seq lhs, rhs)
  let make_array lhs rhs = (Iarray.of_array lhs, rhs)
  let make_iarray lhs rhs = (lhs, rhs)
end

I feel like choosing array vs Iarray.t is not really a matter of designing interfaces but rather a matter of what kind of implementation of arrays I want to (or should) use in a specific scenario. Any thoughts on that?


P.S.: Note that in my example above, make_array could or could not demand in its documentation/contract that the array is held constant while the resulting Equation.t is used.

The documentation also does not indicate that the Iarray module is available since 5.4 unlike, say, Dynarray (since 5.2), Seq (since 4.07) or Bytes (since 4.02). It would be nice to have consistency.

I guess that for unsafely converting an array to an Iarray.t, I simply would use Obj.magic, e.g.:

let unsafe_array_to_iarray : 'a array -> 'a Iarray.t = Obj.magic
let unsafe_iarray_to_array : 'a Iarray.t -> 'a array = Obj.magic

Re-thinking about this, I think when not modifying the input, I have more choices than just Iarray.t and array:

  • use array,
  • use Iarray.t,
  • do not expose the type, i.e. use generic types such that the implementation isn’t exposed,
  • use a different type such as lists or sequences.

I feel like which of those choices is best depends on the specific case. I do not think using Iarray.t is the general answer to “my function doesn’t modify the argument”. In my case, if I copy the argument as part of my algorithm, maybe even a sequence is the better choice over an immutable array.

The documentation also does not indicate that the Iarray module is available since 5.4

Hmm… yet there is a @since annotation at the top of the file. Maybe the doc generator is bamboozled by the open! annotation.

@jbe Yes, your safest bet is to expose an abstract type and a minimal API.

Unsafely, yes.

I assume you mean for Array. Well, a dynarray that doesn’t change size is essentially an array, so the dynarray function seems to mostly subsume that use case.

I made a corresponding proposal:

1 Like

Yes, my apologies, I did add an edit to my post above to avoid confusion:

What I meant is: Why isn’t the a function that unsafely converts an Array into an Iarray, but as I noticed above, there is Obj.magic, which does the job already.

What I meant is: Why isn’t the a function that unsafely converts an Array into an Iarray, but as I noticed above, there is Obj.magic, which does the job already.

Yes, and exposing a function that does Obj.magic, even with “unsafe” in the name, would be questionable because the situations where it’s safe to use would depend on the optimizations performed. So we may as well let people use Obj.magic and know they’re on their own.

I am guessing that a possible use of a specialized function is to change the type of the array without being able the change the type of the elements. You can annotate your use of Obj.magic for that, but its not by default.

Note that the only job that Obj.magic is doing is breaking your code. The Obj.magic function is not part of the language, and using it means giving free reign to compiler optimisations to break your code. And lying over the mutability of data is a particularly dangerous game.

3 Likes

Yeah, thanks for pointing that out. I was also misunderstanding how Dynarray.unsafe_to_iarray works. I thought it converts the types (under certain conditions), but actually it calls a function that must exhibit specific behavior for the whole procedure to be sound.

I don’t plan to use Obj.magic any time soon anyway, but thanks for the warning and notice that Obj.magic is not suitable to convert arrayIarray.t.

Please note that array and iarray are fundamentally incompatible types. They happen to share implementation details, but iarrays are not arrays that are exposed as immutable.
In particular, a function that takes an iarray must be able to rely on the fact that nobody else can mutate the argument.
If you want to document that some functions do not mutate their arguments but still need to use mutable arrays for the rest of your code, my advice is to define an ArrayView module like this:

module ArrayView : sig
  type 'a t
  val from_array : 'a array -> 'a t
  val get : 'a t -> int -> 'a
  ...
end = struct
  type 'a t = 'a array
  let from_array a = a
  let get = Array.get
  ...
end

This module does not export a to_array function, and no mutating functions either. This means that by passing ArrayView.from_array a to a function, you have the guarantee that the function will not be able to mutate the array but you can keep the ability to mutate it yourself safely.

3 Likes

I think that roughly corresponds to my Parray1 example above (for which I also provided an abstraction Parray, which generalizes over non-modified arrays and Iarray.ts). But I guess it’s overkill in most scenarios, and such invariants can better be documented as part of the documentation rather than using the type system, I guess. Afterall, OCaml can’t fully reflect side-effects in its type system anyway.