Testing with different number of lines of the same file

A counter would work. What you might’ve tried and had problems with is breaking out of a loop, like this does:

let fold_take (type acc) (f : acc -> string -> acc) (init : acc) limit ic =
  (* always reads once *)
  assert (limit > 0);

  let exception Return of acc in
  try
    In_channel.fold_lines
      (fun (n, acc) line ->
        let acc = f acc line in
        if n >= limit then raise (Return acc);
        n + 1, acc)
      (1, init) ic
    |> Pair.snd
  with Return v -> v

usage:

# fold_take (fun () line -> print_endline ("::" ^ line)) () 2 stdin;;
1
::1
2
::2
- : unit = ()

Which comes up a few times in What is the programming pattern with multiple if else branching - #5 by jbeckford

Stdlib I/O buffers aggressively so reading line-by-line should be fine for performance. I’d avoid reading the file entirely into memory to then only take n lines. An advanced option is probably memory-mapping the file and working with (index, length) pairs and bingstringaf. Or, using that:

open Angstrom

let line = take_while (( <> ) '\n') <* char '\n'
let iter_lines f = many (line >>| f) *> return ()

let () =
  let fd = Unix.openfile Sys.argv.(1) [O_RDONLY] 0 in
  let st = Unix.fstat fd in
  let map =
    Bigarray.array1_of_genarray
      (Unix.map_file fd Bigarray.char Bigarray.c_layout false [|st.st_size|])
  in
  Angstrom.parse_bigstring ~consume:Angstrom.Consume.All
    (iter_lines print_endline) map
  |> Result.get_ok
2 Likes