Read a file line by line and print line numbers

Ehh, no need to apologize. We Americans are famous for being monolingual, after all. My only other language is French, and I’m quite certain that your English is several hundred times better than my French!

1 Like

When streams are deprecated what are the other ways to do more or less the same thing ?

I suspect the idea is that for lazy streams you should use the Seq module, which is functional. Compare Seq.unfold with Stream.from.

2 Likes

I’ll try to write a program that lazely reads a file line by line and detect eof.

So we should be using Seq.unfold? Something like this:

let ans =
  let filename = print_string "Enter filename: "; read_line() in
  Seq.unfold
    (
      fun c ->
        try
          Some((input_line c), c)
        with
          End_of_file
        | _ -> close_in c; None
    ) (open_in filename)

let () =
  Seq.iter print_endline ans
2 Likes

Here is my latest code,

exception Dont_get_here;;

let fprint_file channel =
    let t = ref 0 in
    let rec fprint_line channel =
        let ilc = input_line channel in
        match ilc with
        | line ->
            incr t;
            Printf.printf "%d:" !t;
            print_endline line;
            fprint_line channel
        | exception End_of_file -> ()
    in
    fprint_line channel
in

let fprint_file (name : string) =
    let channel = open_in name in
    try
      let () = fprint_file channel in
      let () = raise Dont_get_here in
      let () = close_in channel in
      ()
    with e ->
      let () = close_in_noerr channel in
      (* Debug message *)
      print_string (Printexc.to_string e)
in

let () = fprint_file "file.txt" in
()

@octachron : what is the advantage of using Seq.unfold in your proposal ? For instance, I’m seeing the code in ocaml/lexing.ml at 9a157026f115364635f8fe0ae5805e15ef071de0 · ocaml/ocaml · GitHub :

let from_channel ?with_positions ic =
  from_function ?with_positions (fun buf n -> input ic buf 0 n)

where input is used, but no Seq.t is constructed. Is there a reason why they do not construct a Seq.t there, but in the present case, it is better to construct one ?
If the advantage is to split reading from the rest, why don’t the functions in lexing.ml return Seq.t's ? Or is it a hard choice to make between Seq.t’s, which are immutable, and the coding style of lexing.ml, which is imperative with mutable lexbuf ?
I am writing a lexer and wondering what the best solution is, and which “buffering strategy” should be used for better performance. (A pointer to a post or tutorial would already be great.)
Thanks.

Building an explicit sequence allows you to reuse common functions on sequences.

A lexing buffer is not a sequence of tokens, it is a intermediary data structure used to construct a sequence of tokens. It is the lexer generated by an ocamllex that could have exported a function of type Lexing.lexbuf -> token Seq.t if ocamllex did not predate the Seq module.

2 Likes

I tried this code, followed by

let test = In_channel.with_open_text "test.txt" numbered_lines
let () = Seq.iter (fun _ -> ()) test

but in utop, I get:

utop # let test = In_channel.with_open_text "test.txt" numbered_lines;;
val test : (int * string) Seq.t = <fun>
utop # let () = Seq.iter (fun _ -> ()) test;;
Exception: (Sys_error "Bad file descriptor").
Raised by primitive operation at Stdlib.input_line.scan in file "stdlib.ml", line 453, characters 12-32
Called from Stdlib.input_line in file "stdlib.ml", line 471, characters 28-39
Called from unknown location
Called from Stdlib__Seq.unfold in file "seq.ml", line 80, characters 8-11
Called from Stdlib__Seq.iter in file "seq.ml", line 73, characters 8-14
Called from unknown location
Called from Stdlib__Fun.protect in file "fun.ml", line 33, characters 8-15
Re-raised at Stdlib__Fun.protect in file "fun.ml", line 38, characters 6-52
Called from Topeval.load_lambda in file "toplevel/byte/topeval.ml", line 89, characters 4-150

I do not know where this error comes from. The file test.txt itself is ok, since I get

utop # In_channel.(with_open_text "test.txt" (fun ic -> input_line ic));;
- : string option = Some "Ceci est un test."

Remember Seq.t is lazy. Your function is constructing only the first node in the seq, then closing the file. Then when you try to iterate through the rest of the seq, it throws an exception because the file is closed.

In this case you will need to keep the file open until the seq is fully traversed. Simple way to do that is using the open_in and close_in functions.

2 Likes

Thanks @yawaramin , it works:

 let test filename =
  let ic = In_channel.open_text filename in
  Seq.iter (fun (i, s) ->
  print_int i;
  print_string ": ";
  print_endline s) (numbered_lines ic);
  In_channel.close ic;;

but we lose the advantage of with_open_text which does a “clean” use of files. Then, it seems the advantage of using Seq is kind of lost. The following looks safer:

let test2 filename =
  In_channel.with_open_text filename (fun ic ->
  let rec aux n ic =
    match In_channel.input_line ic with
    | Some x -> print_int n;
                print_string ": ";
                print_endline x;
                aux (n+1) ic
    | None -> ()
  in aux 1 ic);;

since it uses with_open_text, and is self-contained, with no appeal to numbered_lines nor next_line_with_number.

(I see this is in the spirit of @nojb above, which goes even further in terse code.)

2 Likes

Yeah, in general iterating efficiently over some resource while also handling safe disposal, errors, being memory-efficient, etc., is pretty difficult. You may want to take a look at [ANN] First release of streaming

2 Likes