Readonly Bigarray

zbroyar · August 4, 2022, 9:52pm

Hi all, glad you are OK

Let’s say I want to create and fill with some data a really large Bigarray, like 80% of all available physical memory. Now I want to fork 40 children to process data in this BA. Theoretically, Linux has COW (copy on write) so as long as I don’t write in this BA all those processes should work over the same physical memory.

Two questions:

Will they?
Does OCaml has already exposed mprotect interface or I need to write it myself?

Thanks in advance for your time!

Muqiu-Han · August 4, 2022, 11:44pm

According to COW, I think they will, since it’s mostly an OS thing，

nobrowser · August 5, 2022, 4:36pm

COW will help you share the array data itself, but keep in mind each process will still need its own page tables to index the data. Depending on what else needs to run in the 20% of RAM left, it could be a problem.

–
Ian

jbeckford · August 6, 2022, 2:19am

Even if it is read-only access, I’d be careful about the garbage collector. I did the same thing you are describing in Python (not OCaml) a few years back to share large read-only tensors across many forked children. I was surprised that the COW benefits were lost because pages were modified by Python’s reference counting, even for read-only retrieval. (The Python setup in python - Shared-memory and multiprocessing - Stack Overflow was fairly similar. I did end up with a solution but can’t remember exactly what it was).

zbroyar · August 6, 2022, 9:16am

Good point, thanks.

If somehow you’ll be able to remember at least approach to the solution, please let me know.

otini · August 6, 2022, 10:39am

The point about the GC modifying data is valid—the GC can update two bits in the headers of OCaml values—however the values in a Bigarray don’t have a header and are not scanned by the GC, so I think you are safe in this case.

zbroyar · August 6, 2022, 11:15am

Thanks. I also expect that SOME of BA pages will be copied but the majority will be shared.

jbeckford · August 6, 2022, 4:11pm

If somehow you’ll be able to remember at least approach to the solution, please let me know.

@otini mentioned you should be safe. For completeness my Python solution was to make use of Internal memory layout of an ndarray:

in the parent process concatenate many large tensors’ ndarray.data into a single huge data file in /dev/shmem. I found some bugs in Python mmap under multiprocessing, so if my procedure looks complicated there was a reason
in the parent process write out some indexing information (tensor id, file offset, etc.) and ndarray attributes into a control file. This was a production fleet that couldn’t tolerate downtime; using control and data files on /dev/shmem let the forked children load new read-only tensors when they were available
in the forked children use a custom multiprocessing Manager to mmap the single data file into a single Python buffer object and from the control file recreate the many ndarray Python tensors. The ndarray attributes are modified in each forked child:
a) each ndarray.data has to point to the single mmap’d data file and
b) each ndarray.strides has to offset where the tensor is in the data file
to make sure the many ndarray tensors were not touched by Python’s reference counter during a fork, I had to rewrite either the fork code or the de/serialization code. But here is where I’m forgetting finer details.

Good luck! Hopefully OCaml makes your real-world use case easier to implement+manage than Python.

zbroyar · August 7, 2022, 12:27pm

Thank you for the explanation. From the description I can see why GC was the issue: the architecture of your data is much more complicated than mine. In my case, - as I work with basically plain single chunk of memory, - GC effects must be much less severe.

Thanks again.

ancolie · August 7, 2022, 2:17pm

I think you’re not the only one who got surprised by this.

UnixJunkie · August 8, 2022, 7:16am

Your approach should work.
Try with a rather small bigarray at first, just to check things in a simple setting
(e.g. can all processes read it, or parts of it, correctly).
Then you can try processing your “big data” taking almost all RAM.
Do you really need mprotect if you mmap a file which was open in read-only mode?

Regards,
F.

zbroyar · August 8, 2022, 7:56am

Hi.

I don’t use mmap. Bigartay is allocated in the memory and then process is forked required number of times.

Mmap somehow severely increase the time of a child process bootstrapping.

UnixJunkie · August 9, 2022, 8:02am

Maybe you can create a Bigarray module which is read-only:
include the Bigarray module in another module where all the modifying operations are hidden by the interface.
Maybe some libraries in opam already provide you that.

zbroyar · August 15, 2022, 9:15pm

Just to let you know, guys: COW works as expected and I did literally nothing to make it work

Thanks for your time and attention

otini · August 17, 2022, 7:29pm

Btw parallelism is made simpler in OCaml 5.0 (currently available in alpha version) using Domains.

zbroyar · August 17, 2022, 7:43pm

Yes, but it’s not yet production ready.
And my current implementation is also simple enough: I use Parmap.

Topic		Replies	Views
Using a bigarray as a shared memory for parallel programming Community parallel , shm	18	2498	December 18, 2019
Memmap a bigarray, then Unix.fork() (parallel initialization of a numerical array) Learning shm , parallel-programming	1	389	March 29, 2023
Is it possible to mmap a regular OCaml array to a file? Learning mmap	12	1588	February 7, 2019
[solved] Unix file handling bindings Learning unix	10	1246	June 1, 2018
What happens in runtime memory when calling `Unix.fork`? Learning memory	16	2198	June 9, 2020

Readonly Bigarray

Related topics