Using Ocaml as scripting language - piping sh commands

I am trying to use ocaml to pipe the result of a command to another (I would also be interested in feeding a string or a io stream into a sh command). For example, I would like to do the equivalent of cat foo.txt | grep thing, or pipe the result of one of my ocaml function into grep.

Quite surprinsingly, neither the Stdlib or Batteries Sys modules expose any way to handle the output of Sys.command directly (I would have thought there would be optional input and output arguments defaulting to stdin and stdout, or something along that). Batteries IO module does expose a pipe function but it’s not clear for me how it would interact with the Sys module. Any ideas or other modules/package I could use ?

Thanks

I think you may be interested by https://github.com/janestreet/shexp .

4 Likes

I had a good experience with shexp.

2 Likes

@grayswandyr @nojb Thanks for the suggestion. I just found shcaml http://tov.github.io/shcaml/doc/ and I was going to give it a try, do you know how it compares to shexp ?

AFAIK shcaml is unmaintained, but the approach is very nice indeed.

Well, I have trouble with shexp. I installed it with opam (opam install shexp) and while this test file works perfectly well : https://github.com/janestreet/shexp/blob/master/process-lib/examples/script.ml
If I try to use Shexp in my project by adding it to my dune file : (libraries shexp) it seems dune can’t find it :
Error: Library “shexp” not found.
Hint: try: dune external-lib-deps --missing ./inject_code.exe
And when I try the hint :
Error: The following libraries are missing in the default context:

  • shexp
    Hint: try: opam install shexp.
    In spite of the fact that shexp is indeed installed. Did I miss something obvious ?

Maybe try sexp.process in your dune file. I don’t know for sure.

Good call :slight_smile: Thanks you all !

Sorry to bother you again, but what is the idiomatic way of treating a file as a stream ? I use

let readlines filename =
 capture_unit [] (run "cat" [filename])
 >>| String.split_on_char '\n'

Which has type string -> (string list) Shexp_process.t (the type I want)
But it feels hackish.

I would also take a look at Bos. Do you want to process lines as they come in, or can you afford to read all input and process the resulting string?

4 Likes

I guess for now I can afford reading all input and process the resulting string. That’s what I’m doing right now, I will see later if it is good enough performance-wise

I think you can use fold_lines to accomplish what you are looking for:

val fold_lines : init:'a -> f:('a -> string -> 'a t) -> 'a t

There are also alternatives:

val iter_lines : (string -> unit t) -> unit t
val fold_chunks : sep:char -> init:'a -> f:('a -> string -> 'a t) -> 'a t
val iter_chunks : sep:char -> (string -> unit t) -> unit t

These all fold over lines/chunks in their input, so you should pipe another Shexp_process.t into them to get the effect you want.

1 Like

It doesn’t look like it is exactly what I what, though ? For example, how can grep all line matching “FOO” in a file foo.txt ? Something equivalent to grep ‘FOO’ foo.txt ? It doesn’t look that these functions allow me to do it without fussing with open, lines_of or the like (this may be the right way to do so, but I was wondering if there was another, more direct way). Can I avoid doing

return @@ File.lines_of filename

You use fold_lines and iter_lines by defining a “process” that gets executed for each line of input, e.g.:

open Shexp_process

let proc =
  capture_unit [] (run "cat" [ filename ])
  |- iter_lines (fun line ->
    if String.is_substring line ~substring:"foo"
    then print line
    else return ())

Of course, there are more efficient ways to implement the actual string searching, but I hope this example helps.

I’m not sure we are talking about the same thing ^^ I was looking for a more “elegant” way to do that

 capture_unit [] (run "cat" [filename] )

But that may be the right way to do it, I don’t know. For the matter, my actual usecase (for now) is to take two files template and src, to find two marker (like BEGIN and END) in template and to replace every line in template by the content of src.
For example, with template :

 aaaaa
BEGIN
ff
fee
END
bbbbbb

and src:

another thing
Foo

I would like to get :

aaaaa
another thing
Foo
bbbbbb

I am not sure how it would fit with your fold functions, so for now I would like to operate on string list, but if you have a suggestion, I would like to see it :slight_smile:

Ah, my mistake. I think you don’t actually want the capture part, because it will bring the entire contents into memory. You want to just pipe the run into fold_lines like so:

open Shexp_process

let proc =
  run "cat" [ filename ]
  |- iter_lines (fun line ->
    if String.is_substring line ~substring:"foo"
    then print line
    else return ())

You could use the accumulator of fold_lines to represent the current state (e.g., scanning for BEGIN, scanning for END, etc.), but that might be a bit awkward. Another suggestion would be to use something more state machine-like instead of shexp:

open! Base
open! Stdio

let rec scan_for_begin (template : In_channel.t) (src : In_channel.t) output_lines =
  match In_channel.input_line template with
  | None -> List.rev output_lines (* eof *)
  | Some "BEGIN" ->
    scan_for_end template src output_lines
  | Some other -> scan_for_begin template src (other :: output_lines)
and scan_for_end template src output_lines =
  match In_channel.input_line template with
  | None -> failwith "expected END line, got EOF"
  | Some "END" -> scan_for_begin template src output_lines
  | Some _other ->
    let replacement_line = In_channel.input_line_exn src in
    scan_for_end template src (replacement_line :: output_lines)
;;

This more directly expresses the behavior that I think you are looking for, rather than trying to reify your state into some accumulator object for the fold.

1 Like

I am not sure why no one mentioned it yet (perhaps I will be the one learning something today), but the functions you are hoping for exist in the standard library, with no need for a third-party library. This may be less comfortable to use than a higher-level API, but still it answers your initial question. The functions you are looking for are located in module Unix (which more generally contains everything related to system programming). They are:

val Unix.pipe : ?cloexec:bool -> unit -> file_descr * file_descr
val Unix.create_process : string -> string array -> file_descr -> file_descr -> file_descr -> int

Unix.pipe () create an anonymous pipe and returns its two ends as file descriptors. The first descriptor is the right end (the input of the right command) and the second descriptor is the left end (the output of the left command).

Unix.create_process prgm args fd0 fd1 fd2 forks a new process by invoking the program prgm with the arguments array args (this array comprises argv[0], so you have to include the program name as the first “argument”). The file descriptors fd0, fd1 and fd2 are the input, normal output and error output for the new process. The function returns the PID of the new process. The function returns immediately; depending on your usage you may need to wait for termination of the forked-off process (see Unix.wait{,pid}).

There are more functions of interest, they are described in the sections “Pipes and redirections” and “High-level process and redirection management” of the manual page about Unix.

Example: emulating the shell command cat | grep -i a can be done like this:

let () =
  let (pipe_right, pipe_left) = Unix.pipe () in
  let pid_left  = Unix.create_process "cat" [| "cat" |] Unix.stdin pipe_left Unix.stderr in
  let pid_right = Unix.create_process "grep" [| "grep" ; "-i" ; "a" |] pipe_right Unix.stdout Unix.stderr in
  let (_, status_left)  = Unix.waitpid [] pid_left in
  let (_, status_right) = Unix.waitpid [] pid_right in
  (* you can check the exit status of both processes here *)
  ()

These functions should be favored over Sys.command which is a quick-and-dirty trick to access the shell rather than an actual process management API. Sys.command invokes the user’s shell and depends on its syntax; Unix functions do not.

2 Likes

Interesting. I saw the Unix module but I was not sure about the file_descr type and how it interacts with IO. Anyway, I settled on Shexp for now, but thanks for your answer anyway

If you are just trying to capture the command output there is this function in Unix module

val open_process_in : string -> in_channel

The first argument is the command which is interpreted by /bin/sh

2 Likes