Angstrom parser optimization

loxs · March 21, 2018, 9:29pm

I am a newbie in parsing and I am still not sure if I get all the terminology and after quite some struggle I only managed to write a suboptimal parser, though luckily, one that works.

Here is a simplified version of my problem
I need to parse lines of a file which look like this:

    SomeKey  SomeValue  ;SomeComment
    SomeKey  SomeValue  
    ;SomeComment

Some of these are optional and we can also have blank lines, so here is where I am at the moment:

open Angstrom
(*  skipping definitions of sub-parsers *)

let data_line =
  choice [
    list [comment];
    list [whitespace; comment;];
    list [whitespace; key; whitespace; value; comment];
    list [whitespace; key; whitespace; value;];
    list [whitespace; key; comment];
    list [whitespace; key;];
    list [whitespace];
  ]

This works correctly and satisfies all my sample files and the tests (what I think are all the possible combinations), but I think it’s a lot slower than it should be. In order for the parser to identify a [whitespace; key] line (which are the most common ones), it has to go over the whole line at least 3 times.

I tried playing with the Angstrom.Buffered interface and that’s probably what I need, but I wasn’t able to figure out how to make it compile. I am also not able to find examples (which I can comprehend) on the internet.

seliopou · March 23, 2018, 10:30pm

I’m not sure if you care to preserve comments in your parser result, but if you did you could implement your parser like this:

let lex p =
  p <* whitespace

let data_line =
  let some x = Some x in
  lift3 (fun x y z -> (x, y, z)
    (lex key) 
    (lex value) 
    (lex (option None (comment >>| some))))

let line =
  whitespace *>
  choice
    [ lex comment   >>| fun comment    -> `Comment comment
    ; data_line >>| fun (k, v, c)  -> `Data(k, v, c) ]

To eliminate choice entirely, you could use a bit of lookahead:

let line =
  whitespace *>
  peek_char_fail
  >>= function
    | ';' -> lex comment >>| fun comment -> `Comment comment
    | _   -> data_line   >>| fun (k, v, c) -> `Data(k, v, c)

Hope this helps.

Topic		Replies	Views
Angstrom: sliding window parser Learning angstrom	0	1083	June 18, 2018
Parse tree structure with Angstrom Learning angstrom	0	618	October 14, 2021
Parsing with error-handling? Learning angstrom	0	160	June 11, 2024
Parsing alternative for more than one char with angstrom Learning angstrom	2	594	March 31, 2021
Parsing simple recursive expressions with Angstrom Learning angstrom , parsing	4	457	December 29, 2023

Angstrom parser optimization

Related topics