Using a bigarray as a shared memory for parallel programming

I am wondering if memory mapped bigarrays are usable as a shared memory in order to do parallel programming.

I am thinking about the write-one-read-many mode of communication, where a writer
process would create a bigarray as a file on disk.
Once this file is created, several reader processes would access it for reading
(only for reading) by mapping it into memory (Unix.map_file).

If I am not mistaken, this would allow to share all types supported by bigarrays between processes, without having to do marshal/unmarshal.

The synchronization between readers and writers would be done outside of the bigarray, using semaphores.

1 Like

Your idea reminds me a lot of the shared memory used in Facebook Hack/Flow/Pyre:

2 Likes

We really need a library similar to python’s multiprocessing. It has pipes (via serialization), queues, and shared memory support for primitive C types (like bigarray). It’s great and fairly easy to use. And no monads – those should be optional.

We’d probably need to have atomic reference counting in the shared memory area, with finalizers from OCaml code.

1 Like

Yes, Bigarray is very suitable for this task. The shared memory module used by Hack/Flow/Pyre does not use Bigarray, but I am actually working on a slightly lower-level shared memory API that does.

Using Bigarray from OCaml is actually a bit nicer than calling into C, since OCaml implements a number of primitives for reading/writing into Bigarray which can be optimized into simple mov instructions, avoiding the need for a call entirely. It’s also nice to be able to write small functions and modules and rely on ocamlopt’s inliner.

For an example of this, take a look at JaneStreet’s Bin_prot library, which makes extensive use of these primitives to implement very efficient serialization code.

2 Likes

If you are looking to store OCaml values in shared memory directly, then you will run into other issues. A big one is the GC. The runtime will not know what to do if it reaches values outside of the managed heap. You can work around this by ensuring the GC bits of these values are “black,” but you still need to be very careful to avoid pointers from shared memory into the managed heap, as GC can move those values out from under you.

Furthermore, polymorphic operations like compare, hashing, and marshaling will not know what to do with your values and will treat them as opaque pointers. That is, the string value “foo” in shared memory would not be =-comparable with “foo” in the managed heap. String.compare should work.

What’s more, since OCaml’s value representation uses absolute pointers, you need to ensure that those pointers are valid in all processes with a mapping to shared memory. You could do this by making the mappings all at a fixed address, which is kind of yucky, or somehow encoding/decoding the pointers with respect to a base pointer, which feels pretty complicated.

Certainly they are. Consider looking at Gerd Stolpmann multicore library. It is available as a part of ocamlnet, so you can install it with

 opam install ocamlnet

See also netshm.

1 Like

I known ocamlnet. I used it in parany (Netmcore_queue):

I also know quite well parmap, where I added several features and use
it extensively in production.

In fact, I am thinking about writing my second parallel library for OCaml.
And I don’t want to pull in the big ocamlnet dependency this time.
Also, I want finer grain control over locking via semaphores.
And I don’t want the shm to be governed by the GC.

1 Like

I think I know this library, and found the interface pretty cumbersome
and the code overly complex.
I think it is available in opam in the package hack_parallel.

What you are describing looks like ocamlnet.

http://projects.camlcity.org/projects/dl/ocamlnet-4.0.4/doc/html-main/Intro.html#netshm
http://projects.camlcity.org/projects/dl/ocamlnet-4.0.4/doc/html-main/Intro.html#netcamlbox
http://projects.camlcity.org/projects/dl/ocamlnet-4.0.4/doc/html-main/Intro.html#netmulticore

I don’t want the shm to be managed by the GC.
So, the values that will be OK to put in the shm will only be all the basic
types which are supported by the Bigarray module.
I know parmap does marshal/unmarshal to/from a char bigarray, but that’s not what I want to do.
I want to avoid Marshal.
I might end up with a library in which users have to provide their own read/write functions to/from the Bigarray they have allocated.
But, I expect this to be faster than the Marshal module, because this will mostly
be data copy. It should also be more compact.
So, the library I am considering will maybe not be completely generic.
But, for some use cases it will do the job and should be pretty fast.

ocamlnet doesn’t exactly fit what I’m talking about. In many ways, ocamlnet is more powerful than what I described, in that it can store OCaml values in shared memory. However, it has to do horrible things to make it work (like overriding C functions in the runtime system) and is quite unsafe.

In my mind, the only safe way to transfer ocaml values between processes is via serialization over pipes, and if you want to use shared memory, you should only use C-types.

Then, in Parmap there is a hack that consists in exchanging OCaml values via marshalling to/from a char bigarray.


Maybe you can start from that for your use case.

By the way, I also put it in opam as the bytearray package.
Because I thought it could be useful out of parmap.

1 Like

Just to clarify, shm in ocamlnet (see the netshm module) is not governed by GC, it is a bigarray underneath the hood.

From my understanding, it has its own GC.

The concurrent hashmap that it implements has actually been made into its own library: https://github.com/rvantonder/hack-parallel/

2 Likes

I am thinking about the write-one-read-many mode of communication, where a writer
process would create a bigarray as a file on disk. Once this file is created, several reader processes would access it for reading ( only for reading) by mapping it into memory (Unix.map_file).

To perhaps state the obvious, you don’t need to back it with a disk-based file. You should be able to map /dev/zero, and then you get a shared region without the overhead of ever synchronizing to disk.

It sounds weird, but that’s what glibc does to serve larger sized malloc() calls.

It seems like this library has a somewhat old version of the shared mem code. The most notable improvement since this snapshot is the addition of an in-place compacting GC for the heap.

Before the new GC, we did something pretty silly: allocating a new temporary heap with malloc sized to fit all live objects, copy everything into the temporary heap, then back into shared mem.

So the old GC can cause vey spiky allocation for large heaps and can run afoul of the OOM killer.

[Ed: @theblatte I just clicked through to your profile and now understand that the above is information you already had :P]

1 Like

Is it possible to update this library with the latest algorithm?

It should be fairly straightforward to take the code from Flow and update the hack-parallel repo. We don’t maintain that library ourselves, but if I find a spare moment I might do it myself. Maybe during the holidays.