One of my goals while doing advent of code is to get better at parsing text with the Scanf module.
On day 6, the problem involved reading a text file and splitting it into a list of lists, where each element of the list represents a line, and the inner list is a list of the tokens in the line separated by whitespace.
let while_success (f : unit -> 'a) : 'a list =
let x : 'a list ref = ref [] in
try
while true do
x := f () :: !x
done;
assert false;
with
| _ -> List.rev !x;;
let lines = while_success (fun () -> Scanf.scanf "%_0c%[^\n]\n" (fun s -> String.split_on_char ' ' s |> List.filter (fun s -> s <> "")))
This is what I ended up doing. It feels suboptimal to me. I wish it was somehow easier to pattern match on multiple format string, and enter a different code branch based on the next token that matches. The point of the %_0c is to get it to raise an exception when it reaches the end of the file, breaking the while loop.
I also found it a little difficult to work around the fact that " “ matches an arbitrary amount of whitespace and newlines, because the problem distinguishes between whitespace and newlines.
Any suggestions for making this cleaner? The overall control flow here doesn’t really feel like a good representation of the machine I want to describe for parsing the input.
Another, related problem: the first n lines of the file are of the form %d-%d, the remaining lines are of the form %d, and we want to return two lists: one of type (int * int) list and another of type int list. what kind of control flow should I use to switch from one scanner to the other? It seems like it would be nice to have a cascading try block where we fall through from one scanner to the next, but reading input is destructive to the buffer so there’s added complexity here.
I noticed that the Batteries.BatReturn has some interesting control flow operations for jumping around using labels, it seems good for implementing state machines.