Readonly Bigarray

Hi all, glad you are OK :slight_smile:

Let’s say I want to create and fill with some data a really large Bigarray, like 80% of all available physical memory. Now I want to fork 40 children to process data in this BA. Theoretically, Linux has COW (copy on write) so as long as I don’t write in this BA all those processes should work over the same physical memory.

Two questions:

  1. Will they? :wink:
  2. Does OCaml has already exposed mprotect interface or I need to write it myself?

Thanks in advance for your time!

According to COW, I think they will, since it’s mostly an OS thing,

1 Like

COW will help you share the array data itself, but keep in mind each process will still need its own page tables to index the data. Depending on what else needs to run in the 20% of RAM left, it could be a problem.


Ian

1 Like

Even if it is read-only access, I’d be careful about the garbage collector. I did the same thing you are describing in Python (not OCaml) a few years back to share large read-only tensors across many forked children. I was surprised that the COW benefits were lost because pages were modified by Python’s reference counting, even for read-only retrieval. (The Python setup in python - Shared-memory and multiprocessing - Stack Overflow was fairly similar. I did end up with a solution but can’t remember exactly what it was).

2 Likes

Good point, thanks.

If somehow you’ll be able to remember at least approach to the solution, please let me know.

The point about the GC modifying data is valid—the GC can update two bits in the headers of OCaml values—however the values in a Bigarray don’t have a header and are not scanned by the GC, so I think you are safe in this case.

1 Like

Thanks. I also expect that SOME of BA pages will be copied but the majority will be shared.

If somehow you’ll be able to remember at least approach to the solution, please let me know.

@otini mentioned you should be safe. For completeness my Python solution was to make use of Internal memory layout of an ndarray:

  1. in the parent process concatenate many large tensors’ ndarray.data into a single huge data file in /dev/shmem. I found some bugs in Python mmap under multiprocessing, so if my procedure looks complicated there was a reason
  2. in the parent process write out some indexing information (tensor id, file offset, etc.) and ndarray attributes into a control file. This was a production fleet that couldn’t tolerate downtime; using control and data files on /dev/shmem let the forked children load new read-only tensors when they were available
  3. in the forked children use a custom multiprocessing Manager to mmap the single data file into a single Python buffer object and from the control file recreate the many ndarray Python tensors. The ndarray attributes are modified in each forked child:
    a) each ndarray.data has to point to the single mmap’d data file and
    b) each ndarray.strides has to offset where the tensor is in the data file
  4. to make sure the many ndarray tensors were not touched by Python’s reference counter during a fork, I had to rewrite either the fork code or the de/serialization code. But here is where I’m forgetting finer details.

Good luck! Hopefully OCaml makes your real-world use case easier to implement+manage than Python.

1 Like

Thank you for the explanation. From the description I can see why GC was the issue: the architecture of your data is much more complicated than mine. In my case, - as I work with basically plain single chunk of memory, - GC effects must be much less severe.

Thanks again.

I think you’re not the only one who got surprised by this. :slight_smile:

2 Likes
  1. Your approach should work.
    Try with a rather small bigarray at first, just to check things in a simple setting
    (e.g. can all processes read it, or parts of it, correctly).
    Then you can try processing your “big data” taking almost all RAM.

  2. Do you really need mprotect if you mmap a file which was open in read-only mode?

Regards,
F.

Hi.

I don’t use mmap. Bigartay is allocated in the memory and then process is forked required number of times.

Mmap somehow severely increase the time of a child process bootstrapping.

Maybe you can create a Bigarray module which is read-only:
include the Bigarray module in another module where all the modifying operations are hidden by the interface.
Maybe some libraries in opam already provide you that.

1 Like

Just to let you know, guys: COW works as expected and I did literally nothing to make it work :slight_smile:

Thanks for your time and attention :slight_smile:

4 Likes

Btw parallelism is made simpler in OCaml 5.0 (currently available in alpha version) using Domains.

1 Like

Yes, but it’s not yet production ready.
And my current implementation is also simple enough: I use Parmap.

1 Like