Mmap, munmap, mremap

Dear All, I am wanting to write some code using an mmap, but at some point I might need to “grow” the mmap. So I need to munmap, and mmap again… or even mremap. But the Unix module doesn’t have munmap or mremap.

Does anyone know of existing code to incrementally grow an mmap? Or failing that, bindings to the Linux mmap/munmap/mremap functionality?

Thanks in advance

Oh, and msync() would be useful to have as well!

Your question probably lacks some context. Are you referring to mmap-backed Bigarrays specifically?

Yes, Bigarrays, or even Bigstrings, are the target interface on the OCaml side.

For some specific missing feature, I think the issue tracker is your friend, but only after looking up the wealth of related discussions regarding msync and munmap. See for instance Add BigArray.Genarray.free by talex5 · Pull Request #389 · ocaml/ocaml · GitHub which mentions incompatibilities with some optimisations eliminating bounds check, which might apply to you as well.

But I am still not sure about what you want to do, what is the bigger picture here?

(For msync see here: MPR#6567: add Unix.flush_mapped_file by xavierleroy · Pull Request #1432 · ocaml/ocaml · GitHub)

Thanks! Those links are relevant, and I see there has been discussion about this for many years. Even flushing a mapped file seems somehow difficult. And closing a map explicitly via munmap breaks some compiler optimizations I understand.

I guess I agree with mshinwell, that explicitly freeing an mmap and then attempting to use the bigarray is a fault of the programmer. But the resulting error (segfault?) is clearly a bad thing, and I can see why OCaml would want to avoid such things. So, I suppose the best that can be hoped for is an non-core library that supports flush_mapped_file and even munmap? (Together with the risk of segfaults.) Does such a thing exist?

I don’t think there are problems with msync, the PR probably just asks to be resurrected.

Since you are asking this question I imagine that it is not possible for you to simply create a second Bigarray with the new size with the same backing file. I think it is possible to remain compatible with the OCaml semantics (modulo possible segfaults if there are out-of-bounds accesses in your program instead of an exception, though not undefined behaviour) as follows:

  • Start by reserving more virtual address space than you would eventually need
  • Have a Bigarray whose length corresponds to the memory you have reserved
  • Replace the mapping with fixed mappings backed by the file when you grow it

You would need to write mapping and resizing functions in C, but this should not be too complicated, you can start from the implementation of the function caml_unix_map_file.

Thank you for all your help! It has been very useful. I will have a think about what to do.

I am still curious about your broader use-case. What problem are you trying to solve? Maybe there is a different solution.

Well, the summary is that we have huge data files. We can mmap them and it seems to be a reasonable performance win over pread/pwrite. But the files may also be appended to. So currently we mmap in a huge size, and if the file data ever grows to that limit, we drop the mmap and mmap an even bigger size. But the drop of the mmap forces all changes to disk and is costly etc. Also, it seems reasonable to want to have some way of forcing changes to disk (msync) without actually dropping the mmap.

extunix might be the library accepting to have such features

1 Like