The Unix library seems to lack a binding to msync to synchronize files and their map.
IIUC this means one needs to wait for the Gc to trigger the finalizer of the map’s bigarray so that the binding gets poperly munmaped. Not a very predictable guarantee but that’s ok for say short lived cli tools.
However I’m a bit curious in which conditions this could not be properly done (segfault only ?) and what would then ensue. I couldn’t spot in the manual if these finalizers are actually guaranteed to be run at the end of the program (e.g. Gc.finalize finalizers are not).
So is writing to files with Unix.map_file any reliable without a binding to msync ?
No. Finalizers are only run on program exit if you use OCAMLRUNPARAM=c, so I think some writes may be lost without an msync.
Writing to memory mapped files is difficult to do reliably:
If anything goes wrong there is no interface to report errors when writing to the map other than SIGBUS (and the OS may decide to do that on its own even before you call msync)
this means the application will crash on ENOSPC, or other I/O errors (e.g. problems with a remote filesystem)
there may be cache coherency issues between writing with mmap and reading with read. I don’t know if it is still true today, but OpenBSD used to be one of those systems
even if you restrict yourself to just one OS you may still end up with a corrupted file. E.g. SQLite has an experimental writable mmap mode for accessing the database, but when I tried turning it on it resulted in corrupt databases on Linux. I was never able to narrow down where the bug was (IIRC it took 24h+ of continous stress testing to trigger the corruption), but with the defaults (using read/write instead of mmap) SQLite is very reliable, and lately sqlite would only use mmap in read-only mode
Although that doesn’t mean that a binding to msync shouldn’t exist, there may be applications where the above issues are acceptable (e.g. single-threaded CLI tool that wants to write to a large file in a binary format without dealing with buffering).
An accurate description of the shortcut I wanted to take, bigarray IO is horribly painful at the moment (likely less once you can afford to be released OCaml 5.2)
For earlier OCaml versions you can try bigstring-unix 0.3 (latest) · OCaml Package (although its associated github repo is archived, pointing to ‘bigstringaf’ instead, but that package has no IO functions AFAICT).
Or perhaps release the OCaml 5.2 bigarray IO function implementations as a separate library for earlier versions (and forward to the builtin ones in newer)?
Thanks. But somehow mmap breaks a lot of other things (e.g. stdin on read). It seems I lured myself on using bigarrays by resurrecting old code of mine while string and bytes are perfectly fine in that case.
In any case if someone is interested by having simple functions to play with mmap reads and (unreliable) writes these functions could help:
(* SPDX-License-Identifier: CC0-1.0 *)
let bigbytes_of_file ?(trunc = false) ?length access file =
let module Bigarray = Stdlib.Bigarray (* OCaml < 5 install woes *) in
let flags, shared = match access with
| `R -> Unix.[O_RDONLY], false
| `RW -> Unix.(O_CREAT :: O_RDWR :: if trunc then [O_TRUNC] else []), true
in
let fd = Unix.openfile file flags 0o644 in
let finally () = try Unix.close fd with Unix.Unix_error _ -> () in
Fun.protect ~finally @@ fun () ->
(* mmap on macOS returns EINVAL rather ENODEV on dirs so we check before *)
let stat = Unix.fstat fd in
if stat.st_kind <> S_REG
then raise (Unix.Unix_error (ENODEV, "bigbytes_of_file'", file)) else
let length = match length with
| None -> -1
| Some length when access = `RW -> length
| Some length -> Int.min length (Unix.lseek fd 0 Unix.SEEK_END)
in
let typ = Bigarray.int8_unsigned and layout = Bigarray.C_layout in
let map = Unix.map_file fd typ layout shared [|length|] in
Bigarray.array1_of_genarray map
let bigbytes_of_file' ?trunc ?length access file =
try Ok (bigbytes_of_file ?trunc ?length access file) with
| Unix.Unix_error (ENODEV, _, _) -> Error "Not a file"
| Unix.Unix_error (e, _, _) -> Error (Unix.error_message e)
An mmapped bigarray is the efficient way to write large int or float arrays to file.
Like a 1000 times faster than marshalling a regular array to file, in my tests.
But how do you guarantee that all changes have actually made it to disk? OCaml won’t unmap the file when you close it (in fact not even on exit). You could force a garbage collection if you’re sure nothing holds a reference to it anymore (e.g. hide the actual bigarray beyond an option ref that you make None and then call full_major twice).
And without an explicit msync+munmap you can’t know that on all filesystems and all OSes your changes have actually made it to disk.
See mmaped bigarrays over NFS · Issue #3571 · ocaml/ocaml · GitHub (referenced from the Bigarray unmap implementation in OCaml), which references Linux NFS faq “Although some implementations of munmap(2) happen to write dirty pages to local file systems, the NFS version of munmap(2) does not. An msync(2) call is always required to guarantee that dirty mapped data is written to permanent storage.”
Although the OS itself would unmap the pages upon program exit I don’t think we can rely on it to also call msync on our behalf.
How about re-submitting a PR to have msync in the standard library? There did not seem to be real objections to its inclusion. In addition is there anything preventing you from incorporating the C implementation in your program in the meanwhile?
Why not but as @edwin mentioned that’s not the only problem of using this interface for writing to files. Error signalling and mmap platform implementation woes are an issue (even in the code I posted above there’s already a cross-platform wart).
In any case with platform support with low Sys.max_string_length becoming rather anecdotic for OCaml there was no good reason not to use simple bytes rather than bigbytes in this case.
The mmap() function shall add an extra reference to the file associated with the file descriptor fildes which is not removed by a subsequent close() on that file descriptor. This reference shall be removed when there are no more mappings to the file.
I think if you want to make that reliable in the current state of affairs you should make sure dst is no longer reachable and then do a Gc.full_major ().
That would ensure dst’s finalizer – which calls munmap – gets called.
(In general it’s not a good idea to rely on Gc finalizers for anything that involves ressource management).
Depending on the OS there might even be no way to fix it. Look at this repro from 2023 with FreeBSD and NFSv3: 270810 – munmap does not always sync the underlying file.
Neither msync, or fsync help there: if the process exits before the NFS syncer in the kernel has finished flushing data to the NFS server then data is permanently lost.
Although as the ticket says that seems to be a bug in the OS, at least one of those functions should ensure this does not happen.
Linux doesn’t seem to have that bug, and with a few quick tests on NFS I wasn’t able to reproduce any lost writes, but if you want to develop portable OCaml applications that works on all the OSes that OCaml supports then I’d suggest avoiding writes with mmap.
Well, maybe we need a binding to munmap then.
When the programmer knows all writes to the mmaped memory region are finished, he calls munmap
to force all pending writes to happen.