Testing with different number of lines of the same file

vivanov · February 7, 2026, 4:11am

Hey, I have a 10 million line file I run tests with. I test with different number of lines there, e.g. first with 1k lines, then 100k lines, then 1m lines. Rather than creating several files with different amount of lines there, what would be a feasible way of achieving this? I was thinking of using a counter, or maybe reading the all the lines of the file and then taking the first 100k. However, 1m looks like to be a lot.

jrfondren · February 7, 2026, 5:28am

A counter would work. What you might’ve tried and had problems with is breaking out of a loop, like this does:

let fold_take (type acc) (f : acc -> string -> acc) (init : acc) limit ic =
  (* always reads once *)
  assert (limit > 0);

  let exception Return of acc in
  try
    In_channel.fold_lines
      (fun (n, acc) line ->
        let acc = f acc line in
        if n >= limit then raise (Return acc);
        n + 1, acc)
      (1, init) ic
    |> Pair.snd
  with Return v -> v

usage:

# fold_take (fun () line -> print_endline ("::" ^ line)) () 2 stdin;;
1
::1
2
::2
- : unit = ()

Which comes up a few times in What is the programming pattern with multiple if else branching - #5 by jbeckford

Stdlib I/O buffers aggressively so reading line-by-line should be fine for performance. I’d avoid reading the file entirely into memory to then only take n lines. An advanced option is probably memory-mapping the file and working with (index, length) pairs and bingstringaf. Or, using that:

open Angstrom

let line = take_while (( <> ) '\n') <* char '\n'
let iter_lines f = many (line >>| f) *> return ()

let () =
  let fd = Unix.openfile Sys.argv.(1) [O_RDONLY] 0 in
  let st = Unix.fstat fd in
  let map =
    Bigarray.array1_of_genarray
      (Unix.map_file fd Bigarray.char Bigarray.c_layout false [|st.st_size|])
  in
  Angstrom.parse_bigstring ~consume:Angstrom.Consume.All
    (iter_lines print_endline) map
  |> Result.get_ok

vivanov · February 7, 2026, 2:42pm

Thank you, Julian, this works. #resolved

Topic		Replies	Views
Read a file line by line and print line numbers Learning	31	5198	May 23, 2022
How do you read the lines of a text file… Learning standardlibrary	9	6052	November 18, 2021
[ANN] First release of line_oriented Community announce	0	695	June 8, 2021
Does this function have a name? Learning	7	752	February 1, 2023
Working with a huge data chunks Learning	13	2108	July 5, 2019

Testing with different number of lines of the same file

Related topics