Is writing to files using Unix.map_file any usable?

The Unix library seems to lack a binding to msync to synchronize files and their map.

IIUC this means one needs to wait for the Gc to trigger the finalizer of the map’s bigarray so that the binding gets poperly munmaped. Not a very predictable guarantee but that’s ok for say short lived cli tools.

However I’m a bit curious in which conditions this could not be properly done (segfault only ?) and what would then ensue. I couldn’t spot in the manual if these finalizers are actually guaranteed to be run at the end of the program (e.g. Gc.finalize finalizers are not).

So is writing to files with Unix.map_file any reliable without a binding to msync ?

4 Likes

No. Finalizers are only run on program exit if you use OCAMLRUNPARAM=c, so I think some writes may be lost without an msync.

Writing to memory mapped files is difficult to do reliably:

  • If anything goes wrong there is no interface to report errors when writing to the map other than SIGBUS (and the OS may decide to do that on its own even before you call msync)
  • there may be cache coherency issues between writing with mmap and reading with read. I don’t know if it is still true today, but OpenBSD used to be one of those systems
  • even if you restrict yourself to just one OS you may still end up with a corrupted file. E.g. SQLite has an experimental writable mmap mode for accessing the database, but when I tried turning it on it resulted in corrupt databases on Linux. I was never able to narrow down where the bug was (IIRC it took 24h+ of continous stress testing to trigger the corruption), but with the defaults (using read/write instead of mmap) SQLite is very reliable, and lately sqlite would only use mmap in read-only mode
  • there are very few applications that would write to an mmap by default (I’m only aware of LMDB), so you may run into OS bugs: “some operating systems that claim to have a unified buffer cache, the implementation is buggy and can lead to corrupt databases”

Although that doesn’t mean that a binding to msync shouldn’t exist, there may be applications where the above issues are acceptable (e.g. single-threaded CLI tool that wants to write to a large file in a binary format without dealing with buffering).

10 Likes

Thanks for your insights Edwin.

An accurate description of the shortcut I wanted to take, bigarray IO is horribly painful at the moment (likely less once you can afford to be released OCaml 5.2)

1 Like

For earlier OCaml versions you can try bigstring-unix 0.3 (latest) · OCaml Package (although its associated github repo is archived, pointing to ‘bigstringaf’ instead, but that package has no IO functions AFAICT).
Or perhaps release the OCaml 5.2 bigarray IO function implementations as a separate library for earlier versions (and forward to the builtin ones in newer)?

1 Like

Thanks. But somehow mmap breaks a lot of other things (e.g. stdin on read). It seems I lured myself on using bigarrays by resurrecting old code of mine while string and bytes are perfectly fine in that case.

In any case if someone is interested by having simple functions to play with mmap reads and (unreliable) writes these functions could help:

(* SPDX-License-Identifier: CC0-1.0 *)

let bigbytes_of_file ?(trunc = false) ?length access file =
  let module Bigarray = Stdlib.Bigarray (* OCaml < 5 install woes *) in
  let flags, shared = match access with
  | `R -> Unix.[O_RDONLY], false
  | `RW -> Unix.(O_CREAT :: O_RDWR :: if trunc then [O_TRUNC] else []), true
  in
  let fd = Unix.openfile file flags 0o644 in
  let finally () = try Unix.close fd with Unix.Unix_error _ -> () in
  Fun.protect ~finally @@ fun () ->
  (* mmap on macOS returns EINVAL rather ENODEV on dirs so we check before *)
  let stat = Unix.fstat fd in
  if stat.st_kind <> S_REG
  then raise (Unix.Unix_error (ENODEV, "bigbytes_of_file'", file)) else
  let length = match length with
  | None -> -1
  | Some length when access = `RW -> length
  | Some length -> Int.min length (Unix.lseek fd 0 Unix.SEEK_END)
  in
  let typ = Bigarray.int8_unsigned and layout = Bigarray.C_layout in
  let map = Unix.map_file fd typ layout shared [|length|] in
  Bigarray.array1_of_genarray map

let bigbytes_of_file' ?trunc ?length access file =
  try Ok (bigbytes_of_file ?trunc ?length access file) with
  | Unix.Unix_error (ENODEV, _, _) -> Error "Not a file"
  | Unix.Unix_error (e, _, _) -> Error (Unix.error_message e)
1 Like

For numerics, LaurentMazare/ocaml-torch has examples of using Unix.map_file, certainly for reading, but I’m not sure how it’s writing the tensors.

An mmapped bigarray is the efficient way to write large int or float arrays to file.
Like a 1000 times faster than marshalling a regular array to file, in my tests.

1 Like

But how do you guarantee that all changes have actually made it to disk? OCaml won’t unmap the file when you close it (in fact not even on exit). You could force a garbage collection if you’re sure nothing holds a reference to it anymore (e.g. hide the actual bigarray beyond an option ref that you make None and then call full_major twice).

And without an explicit msync+munmap you can’t know that on all filesystems and all OSes your changes have actually made it to disk.
See mmaped bigarrays over NFS · Issue #3571 · ocaml/ocaml · GitHub (referenced from the Bigarray unmap implementation in OCaml), which references Linux NFS faq “Although some implementations of munmap(2) happen to write dirty pages to local file systems, the NFS version of munmap(2) does not. An msync(2) call is always required to guarantee that dirty mapped data is written to permanent storage.”
Although the OS itself would unmap the pages upon program exit I don’t think we can rely on it to also call msync on our behalf.

1 Like

How about re-submitting a PR to have msync in the standard library? There did not seem to be real objections to its inclusion. In addition is there anything preventing you from incorporating the C implementation in your program in the meanwhile?

2 Likes

Why not but as @edwin mentioned that’s not the only problem of using this interface for writing to files. Error signalling and mmap platform implementation woes are an issue (even in the code I posted above there’s already a cross-platform wart).

In any case with platform support with low Sys.max_string_length becoming rather anecdotic for OCaml there was no good reason not to use simple bytes rather than bigbytes in this case.

So this code is buggy?

Let’s say n is the size of my bigarray.
src is an already populated bigarray.
I use this code to dump a bigarray to disk on Linux.

  let fd = Unix.(openfile fn [O_RDWR; O_CREAT; O_TRUNC] 0o600) in
  (* create mmapped bigarray *)
  let dst =
    BA.array1_of_genarray
      (Unix.map_file fd BA.Float32 BA.c_layout true [|n|]) in
  (* copy existing bigarray to the (new) one mapped to file *)
  BA1.blit src dst;
  Unix.close fd;

Yes, this is buggy.

The mmap() function shall add an extra reference to the file associated with the file descriptor fildes which is not removed by a subsequent close() on that file descriptor. This reference shall be removed when there are no more mappings to the file.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html

So close doesn’t actually close the file, thus all the limitations that @edwin mentioned still apply.

I think if you want to make that reliable in the current state of affairs you should make sure dst is no longer reachable and then do a Gc.full_major ().

That would ensure dst’s finalizer – which calls munmap – gets called.

(In general it’s not a good idea to rely on Gc finalizers for anything that involves ressource management).

1 Like

Depending on the OS there might even be no way to fix it. Look at this repro from 2023 with FreeBSD and NFSv3: 270810 – munmap does not always sync the underlying file.
Neither msync, or fsync help there: if the process exits before the NFS syncer in the kernel has finished flushing data to the NFS server then data is permanently lost.
Although as the ticket says that seems to be a bug in the OS, at least one of those functions should ensure this does not happen.

Linux doesn’t seem to have that bug, and with a few quick tests on NFS I wasn’t able to reproduce any lost writes, but if you want to develop portable OCaml applications that works on all the OSes that OCaml supports then I’d suggest avoiding writes with mmap.

2 Likes

Well, maybe we need a binding to munmap then.
When the programmer knows all writes to the mmaped memory region are finished, he calls munmap
to force all pending writes to happen.

Maybe but bigarray invalidation is still a bit unclear for now.