Strange bug: make files disappear in both Linux and OS X

While developing pardi (https://github.com/UnixJunkie/pardi),
I came across a problem. I run OCaml 4.06.1.

Here are some steps if you want to reproduce the bug:

git clone git@github.com:UnixJunkie/pardi.git
cd pardi
opam install batteries dolog parany minicli str
make
./maybe_bug.sh

You may have to run ./maybe_bug.sh a few times.
If you see a ‘No such file or directory’ error: bingo, you have
reproduced the bug.

If you can reproduce, or cannot, I am interested to know.
I see this both on OS X and Linux.

I have no idea where this comes from.

If cat is called, it means the file was created before by another process.
But, the file has disappeared?!

Thanks for any feedback,
Francois.

On Linux, adding ‘; sync -f $out’ after ‘ls -l $in $out’ in maybe_bug.sh
makes the problem appear more frequently.

Maybe @Chet_Murthy has an idea? :face_with_monocle:

I presume this is what you mean by a repro, yes?

chet@twitter:~/Hack/ismlgo/tmp/pardi$ ./maybe_bug.sh 
2019-06-27 21:17:54.123 INFO : job_dir: /tmp/pardi_wTYK
cat: /tmp/pardi_wTYK/pardi_out_000000024: No such file or directory
cat: /tmp/pardi_wTYK/pardi_out_000000026: No such file or directory
cat: /tmp/pardi_wTYK/pardi_out_000000027: No such file or directory
cat: /tmp/pardi_wTYK/pardi_out_000000029: No such file or directory

It’s kinda late here, but hey, I’ll have a look in the AM, if nobody else has diagnosed?

1 Like

Bingo, you reproduced!
If it is a bug, it smells like something quite nasty.

I ran the script 100 times and so far I can’t reproduce the bug on my older laptop (Ubuntu 18.04, core i5 5200U). I will try on another one this morning.

1 Like

The two machines where I saw the bug have many cores.
The Linux machine has 16 cores, hyperthreading is off.
The Mac has 24 cores, I don’t know about hyperthreading in there.
Both machines have an Intel Xeon E5 (but then a different version number).

$ uname -a
Linux port050 4.16.0-0.bpo.1-amd64 #1 SMP Debian 4.16.5-1~bpo9+1 (2018-05-06) x86_64 GNU/Linux
$ lscpu 
Architecture :        x86_64
Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit
Boutisme :            Little Endian
Processeur(s) :       4
Liste de processeur(s) en ligne : 0-3
Thread(s) par cœur : 2
Cœur(s) par socket : 2
Socket(s) :           1
Nœud(s) NUMA :       1
Identifiant constructeur : GenuineIntel
Famille de processeur : 6
Modèle :             142
Nom de modèle :      Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz
Révision :           9
Vitesse du processeur en MHz : 599.983
Vitesse maximale du processeur en MHz : 3800,0000
Vitesse minimale du processeur en MHz : 400,0000
BogoMIPS :            4800.00
Virtualisation :      VT-x
Cache L1d :           32K
Cache L1i :           32K
Cache L2 :            256K
Cache L3 :            4096K
Nœud NUMA 0 de processeur(s) : 0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves ibpb ibrs stibp dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
$ sudo dmidecode | grep -i HTT
		HTT (Multi-threading)
$git clone https://github.com/UnixJunkie/pardi.git
$cd pardi
$opam switch create 4.06.1
$opam install batteries dolog parany minicli
$make

matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:29:58.175 INFO : job_dir: /tmp/pardi_ZqgQ
matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:01.068 INFO : job_dir: /tmp/pardi_6dTp
matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:03.038 INFO : job_dir: /tmp/pardi_rSmv
matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:06.848 INFO : job_dir: /tmp/pardi_EbFR
matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:08.607 INFO : job_dir: /tmp/pardi_ceOi
matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:10.861 INFO : job_dir: /tmp/pardi_ZWD5
matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:12.933 INFO : job_dir: /tmp/pardi_J4Nb
matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:14.699 INFO : job_dir: /tmp/pardi_Ow77
matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:16.226 INFO : job_dir: /tmp/pardi_Xjfr
matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:18.821 INFO : job_dir: /tmp/pardi_9U9Q
^[[Amatt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:20.290 INFO : job_dir: /tmp/pardi_Zghh
matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:22.658 INFO : job_dir: /tmp/pardi_rNKr
cat: /tmp/pardi_rNKr/pardi_out_000000028: Aucun fichier ou dossier de ce type
cat: /tmp/pardi_rNKr/pardi_out_000000031: Aucun fichier ou dossier de ce type
matt@port050:~/projets/pardi$ ./maybe_bug.sh
2019-06-28 09:30:26.201 INFO : job_dir: /tmp/pardi_60FB

Cordialement

1 Like

Thanks, so you could reproduce it too.

I think that your dmidecode command tells you that the CPU is
multi-threading capable. However, it doesn’t tell you if it is enabled
or not.

On Linux, this command does:

lscpu | grep 'per core:'

It seems hypertreading is on:

$ lscpu | grep "Thread"
Thread(s) par cœur : 2

It is 100% reproducible on mine machine with OCaml 4.07.0. and 20 cpus.

Are you creating files in one process and reading it in another without any kind of synchronization and expecting it to work?

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                20
On-line CPU(s) list:   0-19
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               1200.203
CPU max MHz:           3100.0000
CPU min MHz:           1200.0000
BogoMIPS:              4389.72
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-19
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
1 Like

In the interest of crowd-sourcing amongst those commenters who decided to have a look at this bug, here’s a little data-point that might or might not be relevant.

(1) first, preserve the temp work-directory:

diff --git a/src/pardi.ml b/src/pardi.ml
index 618e9f3..9b22ef5 100644
--- a/src/pardi.ml
+++ b/src/pardi.ml
@@ -186,8 +189,10 @@ let main () =
     ~work:(process_some job_dir cmd)
     ~mux:(gather_some (ref 0) mux);
   printf "\n";
+(*
   if not !Flags.debug then
     Utls.run_command !Flags.debug (sprintf "rm -rf %s" job_dir);
+*)
   close_in in_chan
 
 let () = main ()

(2) then run the test, cleaning the work-dir before, ls-ing after:

$ rm -rf /tmp/pardi_* ; ./maybe_bug.sh  ; /bin/ls /tmp/pardi*
2019-06-28 14:02:04.802 INFO : job_dir: /tmp/pardi_pdLy
$ rm -rf /tmp/pardi_* ; ./maybe_bug.sh  ; /bin/ls /tmp/pardi*
2019-06-28 14:02:06.700 INFO : job_dir: /tmp/pardi_52Ew
cat: /tmp/pardi_52Ew/pardi_out_000000024: No such file or directory
pardi_out_000000005
$ rm -rf /tmp/pardi_* ; ./maybe_bug.sh  ; /bin/ls /tmp/pardi*
2019-06-28 14:02:11.860 INFO : job_dir: /tmp/pardi_nCeW
$ rm -rf /tmp/pardi_* ; ./maybe_bug.sh  ; /bin/ls /tmp/pardi*
2019-06-28 14:02:14.371 INFO : job_dir: /tmp/pardi_DPYB
$ rm -rf /tmp/pardi_* ; ./maybe_bug.sh  ; /bin/ls /tmp/pardi*
2019-06-28 14:02:16.100 INFO : job_dir: /tmp/pardi_EnIW
$ rm -rf /tmp/pardi_* ; ./maybe_bug.sh  ; /bin/ls /tmp/pardi*
2019-06-28 14:02:17.926 INFO : job_dir: /tmp/pardi_NVR7
cat: /tmp/pardi_NVR7/pardi_out_000000028: No such file or directory
pardi_out_000000006

(3) OK, somewhat noisy, but what I observe is that consistently when the bug manifests, the work-dir is nonempty after the run, and the remnant files are NOT the ones reported as missing.

(4) If I disable the “rm” of the “pardi_out_NNNN” files, the bug disappears. [disabling the “rm” of the “pardi_in_NNNN” files does nothing, as expected.]

==================

[possibly faulty] musings: first, I’ll assume that UNIX filesystem semantics are that all processes see compatible serial schedules of events. This might be incorrect, but geez, I sure hope not. And I can’t find anything to the contrary after a quick Google search. This seems to imply that what we’re seeing here is that some task that wants to delete pardi_out_0006, is instead deleting pardi_out_0028. I wonder whether the “gather_some” callback is being called with the correct arguments.

Anyway, that’s all I got for right now. Hope this is the right place to hold this discussion; I found the issue at Github, but it had a pointer back here, so I figured I should post here and not at that issue.

the “mux” (gather_some) phase is getting fed filenames that don’t yet exist:

index 618e9f3..4f3776f 100644
--- a/src/pardi.ml
+++ b/src/pardi.ml
@@ -109,6 +109,7 @@ let process_some job_dir cmd (count, tmp_in_fn) =
 let out_queue = Sorted_queue.create ()
 
 let gather_some mux_count mux_mode (count, tmp_out_fn) =
+  assert (Sys.file_exists tmp_out_fn) ;
   begin
     match mux_mode with
     | Mux.Null -> () (* tmp_out_fn is not removed? *)

quickly yields an assert failure:

$ rm -rf /tmp/pardi_* ; ./maybe_bug.sh  ; /bin/ls /tmp/pardi*
2019-06-28 15:11:12.318 INFO : job_dir: /tmp/pardi_hYgi
Fatal error: exception File "src/pardi.ml", line 112, characters 2-8: Assertion failed

@UnixJunkie, it seems that there’s some sort of corruption happening. To wit, the following one-line patch fixes your bug:

diff --git a/src/pardi.ml b/src/pardi.ml
index 618e9f3..ea808e7 100644
--- a/src/pardi.ml
+++ b/src/pardi.ml
@@ -109,6 +109,7 @@ let process_some job_dir cmd (count, tmp_in_fn) =
 let out_queue = Sorted_queue.create ()
 
 let gather_some mux_count mux_mode (count, tmp_out_fn) =
+  let tmp_out_fn =  Bytes.(tmp_out_fn |> of_string |> copy |> to_string) in
   begin
     match mux_mode with
     | Mux.Null -> () (* tmp_out_fn is not removed? *)

It seems that something is modifying the value of tmp_out_fn between the moment it’s inserted into out_queue and when it is popped out. Obviously we should discount the possibility of a real memory error, so I’m supposing that there’s code in netmulticore that demarshals, and is reusing memory somehow.

Looking at parany.ml, I see

      (* Netmcore_queue.pop_p avoids data copy out of the shared heap *)

My bet is, this is what’s at fault. Here’s some more substantiation. With the following (somewhat more-extensive) debugging patch, we get a clear indication that there’s a memory-error going on:

modified   src/pardi.ml
@@ -35,9 +35,16 @@ let rec read_one_block
 let read_buff = ref (Bytes.create 0)
 let read_fd = ref (Unix.descr_of_in_channel stdin)
 
+let format_filename job_dir is_input n =
+  let direction = if is_input then "in" else "out" in
+  sprintf "%s/pardi_%s_%09d" job_dir direction n
+
 let read_some job_dir buff count csize input demux () =
   let read = ref 0 in
+  let tmp_fn = format_filename job_dir true !count in
+(*
   let tmp_fn = sprintf "%s/pardi_in_%09d" job_dir !count in
+*)
   Utls.with_out_file tmp_fn (fun out ->
       match demux with
       | Demux.Bytes n ->
@@ -98,30 +105,59 @@ let output_fn_tag = Str.regexp "%OUT"
 let process_some job_dir cmd (count, tmp_in_fn) =
   assert(Utls.regexp_in_string input_fn_tag cmd);
   let cmd' = Str.replace_first input_fn_tag tmp_in_fn cmd in
+  let tmp_out_fn = format_filename job_dir false count in
+(*
   let tmp_out_fn = sprintf "%s/pardi_out_%09d" job_dir count in
+*)
+  assert (Sys.file_exists tmp_in_fn) ;
+  assert (not (Sys.file_exists tmp_out_fn)) ;
   assert(Utls.regexp_in_string output_fn_tag cmd');
   let cmd'' = Str.replace_first output_fn_tag tmp_out_fn cmd' in
   let cmd''' = sprintf "%s; rm -f %s" cmd'' tmp_in_fn in
   Utls.run_command !Flags.debug cmd''';
+  assert (Sys.file_exists tmp_out_fn) ;
   (count, tmp_out_fn)
 
 (* in case we need to preserve input order *)
 let out_queue = Sorted_queue.create ()
+let seen_count = (Hashtbl.create 23 : (int, unit) Hashtbl.t)
+let seen = (Hashtbl.create 23 : (string, unit) Hashtbl.t)
 
-let gather_some mux_count mux_mode (count, tmp_out_fn) =
+let gather_some job_dir mux_count mux_mode (count, tmp_out_fn) =
+(*
+  let tmp_out_fn =  Bytes.(tmp_out_fn |> of_string |> copy |> to_string) in
+*)
+  printf "GATHER (%d, %s)\n" count tmp_out_fn ;
+  assert (format_filename job_dir false count = tmp_out_fn) ;
+  assert (not (Hashtbl.mem seen_count count)) ;
+  Hashtbl.add seen_count count () ;
+  assert (not (Hashtbl.mem seen tmp_out_fn)) ;
+  Hashtbl.add seen tmp_out_fn () ;
+  if not (Sys.file_exists tmp_out_fn) then begin
+    eprintf "ERROR: file %s does not exist\n" tmp_out_fn ;
+    assert (Sys.file_exists tmp_out_fn) ;
+  end;
+(*
+  assert (Sys.file_exists tmp_out_fn) ;
+*)
   begin
     match mux_mode with
     | Mux.Null -> () (* tmp_out_fn is not removed? *)
     | Mux.Sort_cat_into dst_fn ->
       begin
-        Squeue.insert (count, tmp_out_fn) out_queue;
+        Squeue.insert (count, (count,tmp_out_fn)) out_queue;
         let popped = Squeue.pop_all out_queue in
-        List.iter (fun out_fn ->
+        List.iter (fun (count, out_fn) ->
+            printf "MUX (%d, %s)\n" count out_fn ;
+            assert (format_filename job_dir false count = out_fn) ;
             let cmd =
               if !mux_count = 0 then
                 sprintf "mv %s %s" out_fn dst_fn
               else
                 sprintf "cat %s >> %s; rm -f %s" out_fn dst_fn out_fn in
+(*
+                sprintf "cat %s >> %s" out_fn dst_fn in
+*)
             Utls.run_command !Flags.debug cmd;
             incr mux_count
           ) popped
@@ -184,10 +220,12 @@ let main () =
   Parany.run ~verbose:false ~csize:1 ~nprocs
     ~demux:(read_some job_dir (Buffer.create 1024) (ref 0) csize in_chan demux)
     ~work:(process_some job_dir cmd)
-    ~mux:(gather_some (ref 0) mux);
+    ~mux:(gather_some job_dir (ref 0) mux);
   printf "\n";
+(*
   if not !Flags.debug then
     Utls.run_command !Flags.debug (sprintf "rm -rf %s" job_dir);
+*)
   close_in in_chan
 
 let () = main ()

With this patch, the bug occurs, and if we egrep for “(GATHER|MUX)” we get:

GATHER (1, /tmp/pardi_xIhP/pardi_out_000000001)
GATHER (0, /tmp/pardi_xIhP/pardi_out_000000000)
MUX (0, /tmp/pardi_xIhP/pardi_out_000000000)
MUX (1, /tmp/pardi_xIhP/pardi_out_000000001)
GATHER (2, /tmp/pardi_xIhP/pardi_out_000000002)
MUX (2, /tmp/pardi_xIhP/pardi_out_000000002)
GATHER (6, /tmp/pardi_xIhP/pardi_out_000000006)
GATHER (3, /tmp/pardi_xIhP/pardi_out_000000003)
MUX (3, /tmp/pardi_xIhP/pardi_out_000000003)
GATHER (4, /tmp/pardi_xIhP/pardi_out_000000004)
MUX (4, /tmp/pardi_xIhP/pardi_out_000000004)
GATHER (5, /tmp/pardi_xIhP/pardi_out_000000005)
MUX (5, /tmp/pardi_xIhP/pardi_out_000000005)
MUX (6, /tmp/pardi_xIhP/pardi_out_000000025)

Notice that MUX(6) has a filename ending in “25”. No “GATHER” has appeared with that filename. Note also that the GATHER(6) has a filename ending in “06”. So something probably scribbled on the memory of that filename. The comment tells us that this is expected behaviour, I guess.

I think this is your problem. Just copy the arguments to gather_one and things should work as expected.

2 Likes

Ok, that should be it, because I already had to correct one such error in another project.
I’ll tell you if I fix this one.

I don’t know if you plan on transmitting significant amounts of data thru these queues from netmulticore, but if you do, you might wish to think about mimicking what Mark Hayden did in the Ensemble distributed systems toolkit. He used refcounted segments, which he doled-out in small immutable chunks. The refcounting was manual, but backed-up by finalization, so eventually if you failed to properly track references, the memory got cleaned-up. It was quite effective and really improved the performance of Ensemble, making it competitive with C … in 1997.

It’s all documented in his PhD thesis, chapter 4, I think. The chapter on the influence of memory-management, in the section on I/O vectors and such.

Just a thought.

P.S. Of course, only relevant if you push significant amounts of data thru these queues; otherwise, might as well just copy when reading from the queue, and be done with it.

I corrected the bug.
I added some features in my Parany library by the way (set_copy_on_work and set_copy_on_mux); so that the library can be used in a safer way than its defaults.
So yes, the bug was that some things were not copied out of the shared memory,
while they are somewhat long lived. So, at some point they could get mutated.
Thanks for the help.

In this particular project, it’s mostly metadata that goes in the shm. The real data
is passed around on disk via files.

Are you creating files in one process and reading it in another without any kind of synchronization and expecting it to work?

Yes, this is a POSIX property I rely on.
N.b. a file name is passed in the shm only once the file has been written and closed.

Are you creating files in one process and reading it in another without any kind of synchronization and expecting it to work?

i wasn’t sure whether POSIX filesystem semantics specified that all processes in a UNIX instance see a consistent view of filesystem operations, but OTOH, there are good reasons for assuming that it’s true. I mean … otherwise the shit would hit the fan on large NUMA machines.

Could you, please, reference this POSIX property? I would be very interested to know about one.

Besides, even if such exists, Linux is not fully POSIX compliant (and even in POSIX compliant systems there are many different layers of POSIX). And even in cases where an operating system might provide a POSIXly correct behavior, it is often necessary to pass a corresponding feature macro to enable it, as by default it could be disabled.

Concerning the consistency assumption, I wouldn’t be so sure. Speaking of Linux, there is the dentry cache layer, which is an in-memory data structure with a very complex locking mechanism and I’m not sure that I can find any (at least sequential) consistency guarantees stated. Therefore a few number of barriers might be needed. But as always careful reading of the source would give us the correct answer. However, since you’re using shm to pass the metadata and probably even semaphores, you were already introducing those barriers, which make the dcache view consistent.

P.S. Even after pulling the latest version of parany I can’t build the master of pardi, stopping on an error:

~/tmp/woodoo-dalmatian/pardi$ make
dune build @install -j 16
      ocamlc src/.pardi.eobjs/pardi.{cmi,cmo,cmt} (exit 2)
(cd _build/default && /home/ivg/.opam/4.07.0/bin/ocamlc.opt -w @a-4-29-40-41-42-44-45-48-58-59-60-40 -strict-sequence -strict-formats -short-paths -keep-locs -g -bin-annot -I src/.pardi.eobjs -I /home/ivg/.opam/4.07.0/lib/batteries -I /home/ivg/.opam/4.07.0/lib/bytes -I /home/ivg/.opam/4.07.0/lib/dolog -I /home/ivg/.opam/4.07.0/lib/equeue -I /home/ivg/.opam/4.07.0/lib/minicli -I /home/ivg/.opam/4.07.0/lib/netcamlbox -I /home/ivg/.opam/4.07.0/lib/netmulticore -I /home/ivg/.opam/4.07.0/lib/netplex -I /home/ivg/.opam/4.07.0/lib/netstring -I /home/ivg/.opam/4.07.0/lib/netsys -I /home/ivg/.opam/4.07.0/lib/num -I /home/ivg/.opam/4.07.0/lib/ocaml/threads -I /home/ivg/.opam/4.07.0/lib/parany -I /home/ivg/.opam/4.07.0/lib/rpc -no-alias-deps -opaque -o src/.pardi.eobjs/pardi.cmo -c -impl src/pardi.ml)
File "src/pardi.ml", line 192, characters 2-25:
Error: Unbound value Parany.set_copy_on_work
Makefile:4: recipe for target 'build' failed