I am trying to use ocaml to pipe the result of a command to another (I would also be interested in feeding a string or a io stream into a sh command). For example, I would like to do the equivalent of cat foo.txt | grep thing, or pipe the result of one of my ocaml function into grep.
Quite surprinsingly, neither the Stdlib or Batteries Sys modules expose any way to handle the output of Sys.command directly (I would have thought there would be optional input and output arguments defaulting to stdin and stdout, or something along that). Batteries IO module does expose a pipe function but itâs not clear for me how it would interact with the Sys module. Any ideas or other modules/package I could use ?
Well, I have trouble with shexp. I installed it with opam (opam install shexp) and while this test file works perfectly well : https://github.com/janestreet/shexp/blob/master/process-lib/examples/script.ml
If I try to use Shexp in my project by adding it to my dune file : (libraries shexp) it seems dune canât find it :
Error: Library âshexpâ not found.
Hint: try: dune external-lib-deps --missing ./inject_code.exe
And when I try the hint :
Error: The following libraries are missing in the default context:
shexp
Hint: try: opam install shexp.
In spite of the fact that shexp is indeed installed. Did I miss something obvious ?
I guess for now I can afford reading all input and process the resulting string. Thatâs what Iâm doing right now, I will see later if it is good enough performance-wise
I think you can use fold_lines to accomplish what you are looking for:
val fold_lines : init:'a -> f:('a -> string -> 'a t) -> 'a t
There are also alternatives:
val iter_lines : (string -> unit t) -> unit t
val fold_chunks : sep:char -> init:'a -> f:('a -> string -> 'a t) -> 'a t
val iter_chunks : sep:char -> (string -> unit t) -> unit t
These all fold over lines/chunks in their input, so you should pipe another Shexp_process.t into them to get the effect you want.
It doesnât look like it is exactly what I what, though ? For example, how can grep all line matching âFOOâ in a file foo.txt ? Something equivalent to grep âFOOâ foo.txt ? It doesnât look that these functions allow me to do it without fussing with open, lines_of or the like (this may be the right way to do so, but I was wondering if there was another, more direct way). Can I avoid doing
You use fold_lines and iter_lines by defining a âprocessâ that gets executed for each line of input, e.g.:
open Shexp_process
let proc =
capture_unit [] (run "cat" [ filename ])
|- iter_lines (fun line ->
if String.is_substring line ~substring:"foo"
then print line
else return ())
Of course, there are more efficient ways to implement the actual string searching, but I hope this example helps.
Iâm not sure we are talking about the same thing ^^ I was looking for a more âelegantâ way to do that
capture_unit [] (run "cat" [filename] )
But that may be the right way to do it, I donât know. For the matter, my actual usecase (for now) is to take two files template and src, to find two marker (like BEGIN and END) in template and to replace every line in template by the content of src.
For example, with template :
aaaaa
BEGIN
ff
fee
END
bbbbbb
and src:
another thing
Foo
I would like to get :
aaaaa
another thing
Foo
bbbbbb
I am not sure how it would fit with your fold functions, so for now I would like to operate on string list, but if you have a suggestion, I would like to see it
Ah, my mistake. I think you donât actually want the capture part, because it will bring the entire contents into memory. You want to just pipe the run into fold_lines like so:
open Shexp_process
let proc =
run "cat" [ filename ]
|- iter_lines (fun line ->
if String.is_substring line ~substring:"foo"
then print line
else return ())
You could use the accumulator of fold_lines to represent the current state (e.g., scanning for BEGIN, scanning for END, etc.), but that might be a bit awkward. Another suggestion would be to use something more state machine-like instead of shexp:
open! Base
open! Stdio
let rec scan_for_begin (template : In_channel.t) (src : In_channel.t) output_lines =
match In_channel.input_line template with
| None -> List.rev output_lines (* eof *)
| Some "BEGIN" ->
scan_for_end template src output_lines
| Some other -> scan_for_begin template src (other :: output_lines)
and scan_for_end template src output_lines =
match In_channel.input_line template with
| None -> failwith "expected END line, got EOF"
| Some "END" -> scan_for_begin template src output_lines
| Some _other ->
let replacement_line = In_channel.input_line_exn src in
scan_for_end template src (replacement_line :: output_lines)
;;
This more directly expresses the behavior that I think you are looking for, rather than trying to reify your state into some accumulator object for the fold.
I am not sure why no one mentioned it yet (perhaps I will be the one learning something today), but the functions you are hoping for exist in the standard library, with no need for a third-party library. This may be less comfortable to use than a higher-level API, but still it answers your initial question. The functions you are looking for are located in module Unix (which more generally contains everything related to system programming). They are:
val Unix.pipe : ?cloexec:bool -> unit -> file_descr * file_descr
val Unix.create_process : string -> string array -> file_descr -> file_descr -> file_descr -> int
Unix.pipe () create an anonymous pipe and returns its two ends as file descriptors. The first descriptor is the right end (the input of the right command) and the second descriptor is the left end (the output of the left command).
Unix.create_process prgm args fd0 fd1 fd2 forks a new process by invoking the program prgm with the arguments array args (this array comprises argv[0], so you have to include the program name as the first âargumentâ). The file descriptors fd0, fd1 and fd2 are the input, normal output and error output for the new process. The function returns the PID of the new process. The function returns immediately; depending on your usage you may need to wait for termination of the forked-off process (see Unix.wait{,pid}).
There are more functions of interest, they are described in the sections âPipes and redirectionsâ and âHigh-level process and redirection managementâ of the manual page about Unix.
Example: emulating the shell command cat | grep -i a can be done like this:
let () =
let (pipe_right, pipe_left) = Unix.pipe () in
let pid_left = Unix.create_process "cat" [| "cat" |] Unix.stdin pipe_left Unix.stderr in
let pid_right = Unix.create_process "grep" [| "grep" ; "-i" ; "a" |] pipe_right Unix.stdout Unix.stderr in
let (_, status_left) = Unix.waitpid [] pid_left in
let (_, status_right) = Unix.waitpid [] pid_right in
(* you can check the exit status of both processes here *)
()
These functions should be favored over Sys.command which is a quick-and-dirty trick to access the shell rather than an actual process management API. Sys.command invokes the userâs shell and depends on its syntax; Unix functions do not.
Interesting. I saw the Unix module but I was not sure about the file_descr type and how it interacts with IO. Anyway, I settled on Shexp for now, but thanks for your answer anyway