Performance difference between bigarray inside record and plain bigarray access

eWert-Online · August 15, 2021, 4:37pm

Hello

today I noticed a behavior I don’t have an explanation for.
Maybe some of you can explain, why this happens.

Using the following code, the Io implementation where the bigarray is packed inside a record runs segnificantly faster than the one accessed directly.
To me this is really counter intuitive.

open Bigarray

type 'a img = {width: int; height: int; image: 'a}

type data = (int32, int32_elt, c_layout) Array1.t

module type IO = sig
  type t

  val make : unit -> t img

  val read : x:int -> y:int -> t img -> Int32.t
end

module Io_direct : IO = struct
  type t = data

  let make () =
    let width = 4000 in
    let height = 4000 in
    let data = Array1.create int32 c_layout (width * height) in
    Array1.fill data 255l ;
    {width; height; image= data}

  let read ~(x : int) ~(y : int) img =
    Array1.unsafe_get img.image ((y * img.width) + x)
end

module Io_record : IO = struct
  type t = {data: data}

  let make () =
    let width = 4000 in
    let height = 4000 in
    let data = Array1.create int32 c_layout (width * height) in
    Array1.fill data 255l ;
    {width; height; image= {data}}

  let read ~(x : int) ~(y : int) img =
    Array1.unsafe_get img.image.data ((y * img.width) + x)
end

let run name image read =
  let start = Unix.gettimeofday () in
  let () =
    for y = 0 to image.height - 1 do
      for x = 0 to image.width - 1 do
        let _ = read ~x ~y image in
        ()
      done
    done
  in
  let ms = (Unix.gettimeofday () -. start) *. 1000. in
  Printf.printf "%s io took: %f ms \n" name ms

let () =
  let image = Io_direct.make () in
  run "direct" image Io_direct.read

let () =
  let image = Io_record.make () in
  run "record" image Io_record.read

The output it generates is

direct io took: 445.538998 ms 
record io took: 66.195965 ms

Can you explain why the “record io” is so much faster?
Is there some optimization which only runs on records?

Thank you!

dinosaure · August 15, 2021, 5:08pm

Wrap a value into a record adds an indirection. That mean that if you want an access to the underlying bigarray, an extra operation is needed (to dereference the pointer inside the record and finally load the bigarray).

You can see such indirection in assembler for this snippet for instance:

type t = { v : int }

let f0 t = t.v
let f1 v = v

And in assembler (x86_64):

camlExample__f0_8:
        movq    (%rax), %rax
        ret
camlExample__f1_15:
        ret

However, it exists a way to “optimize” your record. Indeed, in some situation, you need to wrap a value into a record, specially when you want the universal quantification (from the type system). In such case, you can use [@@unboxed] - be aware that it changes the caml representation then (so you need to be aware about such detail when you want to do some FFI stuffs).

type t = { v : int } [@@unboxed]

let f0 t = t.v

And in assembler:

camlExample__f0_8:
        ret

eWert-Online · August 15, 2021, 5:19pm

Thanks for your reply, but I think you misread the question

The one wrapped inside a record is faster. I would also think that accessing the value inside the record and then accessing the bigarray would be slower.
Thats why it is so counter intuitive to me

dinosaure · August 15, 2021, 5:20pm

Ah yes, interesting case so you give me a rabbit hole!

dinosaure · August 15, 2021, 5:48pm

So I take a look on the assembly generated and it seems that OCaml do something smart when it manipulate a record. A simple call to Array.unsafe_get emits a call to caml_ba_get_1 as expected, However, with a record, the assembly is a bit different when, in this path, we know that we manipulate an int32.

I think it’s mostly due to the int32 and the combination of:

github.com

ocaml/ocaml/blob/e608b7c95af93db18d18104c9347e19ff50b252e/asmcomp/cmm_helpers.ml#L933-L944

    
      
          match (elt_kind : Lambda.bigarray_kind) with
            Pbigarray_complex32 | Pbigarray_complex64 ->
              let kind = bigarray_word_kind elt_kind in
              let sz = bigarray_elt_size elt_kind / 2 in
              bind "addr"
                (bigarray_indexing unsafe elt_kind layout b args dbg) (fun addr ->
                  bind "reval"
                    (Cop(Cload (kind, Mutable), [addr], dbg)) (fun reval ->
                      bind "imval"
                        (Cop(Cload (kind, Mutable),
                             [Cop(Cadda, [addr; Cconst_int (sz, dbg)], dbg)], dbg))
                        (fun imval -> box_complex dbg reval imval)))

And:

github.com

ocaml/ocaml/blob/e608b7c95af93db18d18104c9347e19ff50b252e/asmcomp/cmm_helpers.ml#L1020-L1022

    
      
          Cop(Calloc, [alloc_header_boxed_int bi dbg;
                       Cconst_symbol(operations_boxed_int bi, dbg);
                       arg'], dbg)

In other words, you should need to “specialize” Io_direct.read and its argument img to enforce the fact that it is a data - because Array1.unsafe_get does not constraint it to be a bigarray of int32. By this way, you can let the opportunity to OCaml to specialize the way to load a data from the given bigarray instead of a simple call to caml_ba_get_1 (which is “polymrophic”).

c-cube · August 15, 2021, 5:53pm

I would double-check on godbolt.org/ but my first intuition, based on
recent experiences, is that the record forces a monomorphic type on the
bigarray. Accessing a monomorphic bigarray is just a few assembly
instructions; a polymorphic bigarray will require a C call, I think.

Chris00 · August 15, 2021, 6:45pm

Indeed, the read in Io_direct should fix the type of the bigarray (as given above, it has type x:int -> y:int -> ('a, 'b, 'c) Stdlib__bigarray.Array1.t img -> 'a, implying as @c-cube said, that the code generated goes through a generic C function). One way to fix the type is:

  let read ~(x : int) ~(y : int) img =
    Array1.unsafe_get (img.image: t) ((y * img.width) + x)

The funny thing is that this code is still a couple of ms slower. To make it faster one must “fetch” the bigarray first:

  let read ~(x : int) ~(y : int) img =
    let i : t = img.image in
    Array1.unsafe_get i ((y * img.width) + x)

OCamlUser · August 16, 2021, 2:49am

Wow! I wanted to say thank you. I have a performance sensitive program in OCaml using big array and adding a type annotation sped up the code ~30%! Is there a list of these little hints for optimization anywhere?

EduardoRFS · August 16, 2021, 4:20am

Essentially the lambda primitives which are polymorphic, like compare, =, Array.get and more. An extensive list of them can be found here

github.com

ocaml/ocaml/blob/600aad5d29643598d4bddee346275a51930c7859/lambda/translprim.ml#L118


      
            Hashtbl.fold (fun path _ acc -> path :: acc) used_primitives []
          
          let gen_array_kind =
            if Config.flat_float_array then Pgenarray else Paddrarray
          
          let prim_sys_argv =
            Primitive.simple ~name:"caml_sys_argv" ~arity:1 ~alloc:true
          
          let primitives_table =
            create_hashtable 57 [
              "%identity", Identity;
              "%bytes_to_string", Primitive (Pbytes_to_string, 1);
              "%bytes_of_string", Primitive (Pbytes_of_string, 1);
              "%ignore", Primitive (Pignore, 1);
              "%revapply", Revapply;
              "%apply", Apply;
              "%loc_LOC", Loc Loc_LOC;
              "%loc_FILE", Loc Loc_FILE;
              "%loc_LINE", Loc Loc_LINE;
              "%loc_POS", Loc Loc_POS;
              "%loc_MODULE", Loc Loc_MODULE;

Topic		Replies	Views
Buffered IO, bytes vs bigstring Learning	8	1387	December 20, 2021
Array.map vs List.map performance Learning performance	6	1259	May 14, 2024
`float array` and `floatarray` have similar performance in my benchmark Learning	9	789	February 14, 2023
Question about large (big?)arrays (maybe of int) and garbage collection Learning garbage-collection	5	737	November 27, 2020
Working with a huge data chunks Learning	13	2025	July 5, 2019

Performance difference between bigarray inside record and plain bigarray access

Related topics