I am a newbie in parsing and I am still not sure if I get all the terminology and after quite some struggle I only managed to write a suboptimal parser, though luckily, one that works.
Here is a simplified version of my problem
I need to parse lines of a file which look like this:
SomeKey SomeValue ;SomeComment
SomeKey SomeValue
;SomeComment
Some of these are optional and we can also have blank lines, so here is where I am at the moment:
open Angstrom
(* skipping definitions of sub-parsers *)
let data_line =
choice [
list [comment];
list [whitespace; comment;];
list [whitespace; key; whitespace; value; comment];
list [whitespace; key; whitespace; value;];
list [whitespace; key; comment];
list [whitespace; key;];
list [whitespace];
]
This works correctly and satisfies all my sample files and the tests (what I think are all the possible combinations), but I think it’s a lot slower than it should be. In order for the parser to identify a [whitespace; key]
line (which are the most common ones), it has to go over the whole line at least 3 times.
I tried playing with the Angstrom.Buffered
interface and that’s probably what I need, but I wasn’t able to figure out how to make it compile. I am also not able to find examples (which I can comprehend) on the internet.