Why does read_line throw End_of_file when used after Unix.pipe?

Context: To teach myself more about OCaml, Parsers and Unix systems programming I’m writing a basic shell in OCaml.

If I enter something like ls -Z while running my shell it successfully prints to stdout and moves the prompt to the next line. If I do something more complex like foo < infile 2> errfile > outfile it still woks just fine, moving the prompt to the next line. But if I do anything with any number pipes (viz. ls | grep foo) it will process the pipeline, emit to stdout, run the prompt a second time and then immediately throw End_of_file exception.

> ls | grep dune
dune-project
> 
// Immediately exits after printing > a second time

I’ve narrowed down the problem to my usage of read_line (). read_line is throwing an EOF immediately after I successfully pipe without waiting for user input.

Here is my main loop

let repl _ =
  print_newline ();
  let rec repl' _ =
    print_string "> ";
    let ln = read_line () in (* This line is throwing on the second iteration*)
    let prog = parse_string ln in
    ignore(exec_conditional prog);
    repl' ()
  in
  try repl' () with End_of_file -> ()

let () = repl ()

And the body of my executor where i perform the pipe (kind of long but for completeness)

let lastexitcode = ref 0

let redirect {file_desc = fd; filename; _} =
  match fd with
  | 0 ->
      let filehandle = Unix.openfile filename [ O_RDONLY ] 0o640 in
      Unix.dup2 filehandle Unix.stdin
  | 1 ->
      let filehandle =
        Unix.openfile filename [ O_TRUNC; O_CREAT; O_WRONLY ] 0o640
      in
      Unix.dup2 filehandle Unix.stdout
  | 2 ->
      let filehandle =
        Unix.openfile filename [ O_TRUNC; O_CREAT; O_WRONLY ] 0o640
      in
      Unix.dup2 filehandle Unix.stderr
  | _ -> raise @@ ExecError "TODO: Impl arbitrary file descriptor redirection"

let exec_pipeline pipeline =
  let pipeline_array = Array.of_list pipeline in
  let upper_index_bound = Array.length pipeline_array - 1 in
  (* Iterate from 0 to just max index - 1, performing the final fork after this loop*)
  for i = 0 to upper_index_bound - 1 do
    let fd_in, fd_out = Unix.pipe () in
    let command = pipeline_array.(i) in
    let pid = Unix.fork () in
    if pid < 0 then
      raise
      @@ ExecError ("Fork failed for command: " ^ command_to_string command)
    else if pid > 0 then (
      Unix.dup2 fd_in Unix.stdin;
      Unix.close fd_out;
      Unix.close fd_in)
    else (
      Unix.dup2 fd_out Unix.stdout;
      Unix.close fd_out;
      Unix.close fd_in;
      List.iter redirect command.redirections;
      Unix.execvp command.executable
        (Array.of_list (command.executable :: command.args)))
  done;

  let command = pipeline_array.(upper_index_bound) in
  let pid = Unix.fork () in
  if pid < 0 then
    raise @@ ExecError ("Fork failed for command: " ^ command_to_string command)
  else if pid > 0 then
    let _, status = Unix.waitpid [ Unix.WUNTRACED ] pid in
    match status with
    | Unix.WEXITED exitcode -> lastexitcode := exitcode
    | _ -> failwith "TODO: Stopped and signalled processes unimplemented"
  else (
    List.iter redirect command.redirections;
    Unix.execvp command.executable
      (Array.of_list (command.executable :: command.args)));
  
  !lastexitcode

let rec exec_conditional = function
| BasePipeline p -> exec_pipeline p
| Or (lhs, rhs) ->
  let retcode = exec_conditional lhs in
  if retcode <> 0 then exec_conditional rhs else retcode
| And (lhs, rhs) ->
  let retcode = exec_conditional lhs in
  if retcode <> 0 then retcode else exec_conditional rhs

For brevity I’ve omitted some of my type definitions since they don’t seem to be the problem.

So in essence, my question is why read_line is throwing End_of_file immediately after I call Unix.pipe, and only after I call Unix.pipe, without waiting for user input? I see in the docs that read_line would throw if EOF is encountered at the start of input, but why would EOF show up before I enter any input on the second iteration?

Unix.dup2 src dst closes dst before cloning src into dst, so here you are closing Unix.stdin in the parent process (the shell), and so then read_line () will return End_of_file, no?

Cheers,
Nicolas

1 Like

Ahhhh ok. Mea culpa. I was following this resource here as an example but didn’t consider it would be different. The author was writing a one-shot process that doesn’t need to return back to a repl/main loop so it didn’t matter to them

https://ocaml.github.io/ocamlunix/pipes.html#sec114

I’ll have to only dup2 if it is a child process right?

I managed to get it working!

I just added let tempin = Unix.dup Unix.stdin in at the beginning of exec_pipeline before my loop, and Unix.dup2 tempin Unix.stdin; as the second to last line, right before !lastexitcode.

Seems that all I needed to do was save the file descriptor somewhere and restore it. Not sure if this is the optimal solution (open to suggestions of course!) but it works.

@nojb Thank you so much for the tip!