Race condition with Unix.lockf?

Hi.

I have a server, which handle requests with this piece of code:

658 let read_or_create ?magic fname read write =
659   assert (Secure.check fname) ;
660   print_endline (Printf.sprintf "(%d) %s %s %b%!" (Unix.getpid ()) __LOC__ fname (Sys.file_exists fname)) ;
661   let fd = Unix.openfile fname [ Unix.O_RDWR ; O_CREAT ] 0o666 in
662   begin
663     try Unix.lockf fd Unix.F_TLOCK 0
664     with _ ->
665       print_endline (Printf.sprintf "(%d) %s%!" (Unix.getpid ()) __LOC__) ;
666       Unix.sleep 0 ;
667       Unix.lockf fd Unix.F_LOCK 0
668   end ;
669   print_endline (Printf.sprintf "(%d) %s%!" (Unix.getpid ()) __LOC__) ;
670   let ic = open_in_bin fname in
671   let default () =
672     close_in ic ;
673     let oc = open_out_bin fname in
674     begin match magic with Some m -> output_string oc m | None -> () end ;
675     let r = write oc in
676     flush oc ;
677     print_endline (Printf.sprintf "(%d) %s : %d%!" (Unix.getpid ()) __LOC__ (out_channel_length oc)) ;
678     close_out oc ;
679     Unix.lockf fd Unix.F_ULOCK 0 ;
680     Unix.close fd ;
681     r
682   in
683   if in_channel_length ic > 0
684   then begin
685     if match magic with None -> true | Some m -> print_endline (Printf.sprintf "(%d) %s%!" (Unix.getpid ()) __LOC__) ; check_magic m ic
686     then begin
687       print_endline (Printf.sprintf "(%d) %s%!" (Unix.getpid ()) __LOC__) ;
688       let r = read ic in
689       close_in ic ;
690       print_endline (Printf.sprintf "(%d) %s%!" (Unix.getpid ()) __LOC__) ;
691       Unix.lockf fd Unix.F_ULOCK 0 ;
692       Unix.close fd ;
693       r
694     end else (print_endline (Printf.sprintf "(%d) %s%!" (Unix.getpid ()) __LOC__) ;  default () )
695   end else (print_endline (Printf.sprintf "(%d) %s%!" (Unix.getpid ()) __LOC__) ;
696 default () )

I expect the second request to fail and then block until the first one has finished processing (and output result in fname) and read the file.

I can assure that write function is not releasing the lock. Basically, what is done is:

        output_string oc Mutil.executable_magic ;
        Marshal.to_channel oc tstab [ Marshal.No_sharing ; Marshal.Closures ] ;

Here is what is logged in the console.

(28052) File "lib/util/mutil.ml", line 660, characters 67-74 tstab false
(28052) File "lib/util/mutil.ml", line 669, characters 61-68
(28052) File "lib/util/mutil.ml", line 695, characters 71-78
(28055) File "lib/util/mutil.ml", line 660, characters 67-74 tstab true
(28055) File "lib/util/mutil.ml", line 669, characters 61-68
(28055) File "lib/util/mutil.ml", line 695, characters 71-78
(28052) File "lib/util/mutil.ml", line 677, characters 68-75 : 4973238
(28055) File "lib/util/mutil.ml", line 677, characters 68-75 : 4973238

Am I missing something obvious?

PS: do not pay attention to “useless” operations, I am just trying to debug this.

It’s hard to say what’s going on with your example. Did you try to trace system calls in the two processes that run concurrently? (“strace” under Linux.)

Also, if you’re trying to create and write a file atomically, it might be simpler to write to a unique temporary file in the same directory (Filename.open_temp_file), then rename the temporary file (Sys.rename or Unix.rename).

1 Like

POSIX lockf(3), and fcntl(2), releases its locks when the process that holds the locks closes ANY file descriptor that was open on that file.

Apologies for not really following what you’re up to there, but just at a casual glance, that could be happening at 672.

Thank you both for your answers.

I was not aware of the fact that it would release the lock when closing any channel / file descriptor on a file. Thanks for the tip, that was what was wrong here.

:+1:

PS: What I am doing is to compute a cache file which can take several seconds to be computed if not present, and I do not want other process to do the same computation, so the goal is to wait for the cache file to be written so you just have to read it with other process.