How to run external command from OCaml script

ffreling · October 13, 2020, 5:31pm

I’m trying to replace a bunch of bash script with OCaml but I have difficulties getting the output of external commands.

Here is my simple script:

#!/usr/bin/env utop

open Unix;;

let print_chan channel =
  let rec loop () =
      let () = print_endline (input_line channel) in
      loop ()
    in
  try loop ()
  with End_of_file -> close_in channel;;

let () =
  let (ocaml_stdout, ocaml_stdin, ocaml_stderr) = Unix.open_process_args_full "echo" [| "echo"; "foo" |] (Unix.environment ()) in
  close_out ocaml_stdin;
  print_chan ocaml_stdout;
  print_chan ocaml_stderr;
  print_endline "terminado!";

I am using utop as the toplevel to have access to common modules such as Unix.
Unfortunately, my script does not print anything except “terminado!”. I’d like to get the output of echo foo.

zozozo · October 13, 2020, 6:58pm

I think you’re missing a call to Unix.close_process_full (cf https://caml.inria.fr/pub/docs/manual-ocaml/libref/Unix.html#VALclose_process_full ) to wait for the termination of the command.

nojb · October 13, 2020, 7:23pm

Hello,
This is not directly related to your question, but for shell-style scripting, the combination of Sys.command + Filename.quote_command is a very robust and portable alternative to using Unix, and is simpler to use.

Cheers,
Nicolas

ffreling · October 13, 2020, 8:56pm

I tried doing so but I get a Exception: Sys_error "Bad file descriptor".

let () =
  let ((ocaml_stdout, ocaml_stdin, ocaml_stderr) as p) = Unix.open_process_args_full "echo" [| "echo"; "foo" |] (Unix.environment ()) in
  let _ = Unix.close_process_full p in
  close_out ocaml_stdin;
  print_chan ocaml_stdout;
  print_chan ocaml_stderr;
  print_endline "terminado!";

ffreling · October 13, 2020, 8:58pm

I would like to avoid creating temporary files for this use case. I don’t mind the heavier API (once I get it working).

nojb · October 14, 2020, 2:38am

Unix.close_process_full already takes care of closing the file descriptors, so you should not close them yourself.

Cheers,
Nicolas

donn · October 14, 2020, 2:58am

I think it will work better if the first parameter (“command to run”) is a complete file path - “/bin/echo”.

That will get the example working. When you try to apply this as you intend, I’m guessing you will encounter new problems related to output buffering. You can address one of those on your end - where you write to the process, consider a flush at critical points, so the data actually becomes available to the process.

If this is simply a bulk data processing job, that may be all you need. If it’s intermittent data, then you have the same problem on the other end, where you will need access to the code to fix it - if the process output goes through C stdio or something like it, and doesn’t explicitly flush the buffer, it may not become available for a while.

The other pitfall here is that the pipe device itself is of limited size, and when it fills up, the process blocks. If you’re writing, you can’t read at the same time, so you fill up the pipe on your end, and you’re both blocked. And you have two outputs, stderr and stdin, so the process can block on one while you’re reading on the other.

You can address some of this with Unix.select, but bearing in mind that this operates on the underlying file descriptor and ignores buffered data - so there might be data that input could read, but select will say there’s nothing.

Temporary files are a very professional way to go.

ffreling · October 14, 2020, 11:20am

Even if I don’t close them myself I have this error:

 #!/usr/bin/env utop

open Unix;;

let print_chan channel =
  let rec loop () =
      let () = print_endline (input_line channel) in
      loop ()
    in
  try loop ()
  with End_of_file -> ();;

let () =
  let ((ocaml_stdout, ocaml_stdin, ocaml_stderr) as p) = Unix.open_process_args_full "echo" [| "echo"; "foo" |] (Unix.environment ()) in
  let _ = Unix.close_process_full p in
  print_chan ocaml_stdout;
  print_chan ocaml_stderr;
  print_endline "terminado!";

|> ./capture.ml
Exception: Sys_error "Bad file descriptor".

I guess once I close the process I can’t access the channels anymore.

ffreling · October 14, 2020, 11:46am

Thanks! Specifying the whole binary path did fix the issue.
The difference between open_process and open_process_args was not very clear in my head, but since I want to run commands in a shell, I should use open_process.

Here is my working code in case it can help someone:

#!/usr/bin/env utop

open Unix;;

let print_chan channel =
  let rec loop () =
      let () = print_endline (input_line channel) in
      loop ()
    in
  try loop ()
  with End_of_file -> close_in channel;;

let () =
  let (ocaml_stdout, ocaml_stdin, ocaml_stderr) = Unix.open_process_full "echo foo" [||] in
  close_out ocaml_stdin;
  print_chan ocaml_stdout;
  print_chan ocaml_stderr;
  print_endline "terminado!";

As for using temporary files, I don’t think I need it in my use case but you made good points about the pipe size limitation. I’ll keep it in mind if I run into pipe issues.

PatrickMacdonald · November 11, 2020, 8:04pm

Good topic and excellent solutions, thanks all!

chshersh · April 16, 2023, 2:00pm

I found this topic when encountering a similar problem, and experienced several challenges I did not expect to face. Still, I found the discussion extremely helpful!

I’ve spent several hours trying to figure out things that were not documented and I haven’t found anywhere else, so I would like to share my findings to save someone else some time (especially beginners).

Close channels only after reading them.
This is mentioned in the middle of the discussion but I wanted to highlight this separately.
Unix.open_process_full doesn’t pass environment variables to the running process.
So if you expect the running command to have access to env variables like $HOME, you need to set them explicitly.
Use Unix.system to run commands with all env variables passed and original output preserved.
If you don’t care about reading the output of the process, the Unix.system function is the easiest way to run external processes from an OCaml program.
Base doesn’t have the command function in the Sys module.
Two modules in base and OCaml stdlib are different but still look the same. I’ve spent an unreasonable amount of time trying to figure out why I can use Sys.command when in fact I had open Base. Some StackOverflow answers mention this function for running external processes. Use Unix.system instead.
You can’t preserve the original interleaved output of stdout and stderr when reading the process output.
Or at least, I haven’t found the way. But usually the process output text to both stdout and stderr in arbitrary order, and you can’t preserve this order if you reading from the corresponding handles independently.

Based on the above, I’ve implemented a small interface for running external processes and reading their output in my recent tool:

Topic		Replies	Views
Problem with importing the Unix module in an OCaml script Learning opam , build	1	715	May 6, 2023
How to use "ocaml -stdin"? Learning toplevel	2	472	February 25, 2024
[ANN] Subprocess: a library for launching and communicating with Unix commands Ecosystem announce	8	479	April 15, 2025
Execute bash script within OCaml [SOLVED] Learning script , bash	8	2492	November 7, 2017
【Q】fd, channel, flush & buffer Learning string , text-processing , channels , file , buffer	1	737	March 21, 2023

How to run external command from OCaml script

Related topics