Advent of code, Scanf

One of my goals while doing advent of code is to get better at parsing text with the Scanf module.

On day 6, the problem involved reading a text file and splitting it into a list of lists, where each element of the list represents a line, and the inner list is a list of the tokens in the line separated by whitespace.


let while_success (f : unit -> 'a) : 'a list =
  let x : 'a list ref = ref [] in
  try 
    while true do
      x := f () :: !x
    done;
    assert false;
  with
  | _ -> List.rev !x;;

let lines = while_success (fun () -> Scanf.scanf "%_0c%[^\n]\n" (fun s -> String.split_on_char ' ' s |> List.filter (fun s -> s <> "")))

This is what I ended up doing. It feels suboptimal to me. I wish it was somehow easier to pattern match on multiple format string, and enter a different code branch based on the next token that matches. The point of the %_0c is to get it to raise an exception when it reaches the end of the file, breaking the while loop.

I also found it a little difficult to work around the fact that " “ matches an arbitrary amount of whitespace and newlines, because the problem distinguishes between whitespace and newlines.

Any suggestions for making this cleaner? The overall control flow here doesn’t really feel like a good representation of the machine I want to describe for parsing the input.

Another, related problem: the first n lines of the file are of the form %d-%d, the remaining lines are of the form %d, and we want to return two lists: one of type (int * int) list and another of type int list. what kind of control flow should I use to switch from one scanner to the other? It seems like it would be nice to have a cascading try block where we fall through from one scanner to the next, but reading input is destructive to the buffer so there’s added complexity here.

I noticed that the Batteries.BatReturn has some interesting control flow operations for jumping around using labels, it seems good for implementing state machines.

1 Like

On day 6 of 2025 it’s much easier to retain the exact spacing so I didn’t use scanf for it. That “ “ feature, the same as C’s scanf, is never what I want for non-interactive use, and it makes interactive programs seem broken:

Enter three numbers: 1 2<enter
<enter
  <enter
<user: why isn't it responding?

the user misread the prompt and scanf is still waiting for a third number. The interface I always prefer is

Enter three numbers: 1 2<enter
That's only two.
Enter three numbers: 

What I do instead is read lines of input and then pass them to sscanf.

The separate read_line gives you an easy option for the separate sections of 2025 day 5’s input: you can check the line before calling sscanf, and end the first section on the empty line between the sections:

let ranges =
  let rec aux acc =
    match read_line () with
    | "" -> acc
    | line -> aux (Scanf.sscanf line "%d-%d%!" Pair.make :: acc)
  in
  aux [] |> List.rev

let ids =
  let rec aux acc =
    try aux (Scanf.sscanf (read_line ()) "%d%!" Fun.id :: acc)
    with End_of_file -> acc
  in
  aux [] |> List.rev

which gives you, for the example input:

val ranges : (int * int) list = [(3, 5); (10, 14); (16, 20); (12, 18)]
val ids : int list = [1; 5; 8; 11; 17; 32]

Something neat: this program can be tested repeatedly in utop. #use the file, paste the input, then hit ctrl-D to end input. I’m used to interactive programs not being usable after a ctrl-D.

… I don’t get why this has an infinite recursion, though:

let ids =
  let rec aux () =
    try Scanf.sscanf (read_line ()) "%d" Fun.id :: aux ()
    with End_of_file -> []
  in aux ()

when this very similar code is fine:

let ids =
  let rec aux () =
    match read_line () with
    | line -> Scanf.sscanf line "%d" Fun.id :: aux ()
    | exception End_of_file -> []
  in aux ()
1 Like

Keep in mind that the order of evaluation is unspecified. In particular, your code could just as well be executed as if it had been written

let rec aux () =
  try
    let v = aux () in
    Scanf.sscanf (read_line ()) "%d" Fun.id :: v
  with End_of_file -> []

which is now an obvious infinite loop.

2 Likes

I think this is what I meant when I said BatReturn could be useful here:

let read_input () =
  let open BatReturn in
  label (fun break ->
      let lines = ref [] in 
      while true do
        try
          lines := read_line () :: !lines 
        with
        | End_of_file -> return break (List.rev !lines)
      done;
      assert false
    )

I find it more readable or natural with the try inside the while loop, rather than outside it.

I was surprised that sscanf s is apparently pure, whereas I had created that it would expect an internal buffer and start advancing it.

let break_up_string_int s =
  let open BatReturn in
  label (fun break ->
      let tokens = ref [] in
      let scanner = Scanf.sscanf s in 
      while true do
        try
          print_endline "Scanning...";
          tokens := (scanner "%d " (fun x -> Printf.printf "%d\n"x ; x)) :: !tokens
        with
        | End_of_file -> return break (List.rev !tokens)
      done;
      assert false
    )

On input 7446 342 … this prints 7446 over and over. I changed it to

let scanner = Scanf.bscanf @@ Scanf.Scanning.from_string s in

and this worked, of course. Still, I find it slightly counterintuitive that it’s the only one that doesn’t expose a state buffer to the end user like this.