Parser combinators vs. parser preprocessors?

Chet_Murthy · March 6, 2020, 12:33am

This question is elicited by the Transept post, and specifically by the linked-to Parsec paper on parser combinators. I’ve started reading hte paper, and have comparing-and-contrasting with the implementation of the Ocaml parser itself.

To wit, it seems like there are a number of places where it’s useful/important to have a preprocessor that takes a language that is most definitely isn’t made of parser-combinators, and does various syntactic transformations on it. The example that comes to mind is the heavy use of inlining, and specifically in order to both achieve modularity (of a sort) while preserving the ability for the grammar-compiler to detect and implement proper precedence rules. I’m betting there are other examples.

Also, I’m personally a massive LL(1) (over LALR) bigot, so it’s not that I’m persuaded by “yacc” or anything: it’s a general feeling that the preprocessing of a grammar by various algorithms/heuristics is valuable, and hence, that parser-combinators are not the way to go, to write language-processors.

Maybe I’m wrong. I’d love to be wrong. Those tools don’t fit well into the rest of our libraries and code, after all.

XVilka · March 6, 2020, 4:27am

There is a Parsec-inspired parser combinator in OCaml - Angstrom.

gasche · March 7, 2020, 12:37pm

Combinators also describe a grammar; they can build a representation that is then processed. I think it would be perfectly reasonable to provide combinators to describe a L(AL)R grammar, and then a function from such a grammar to a parsing automaton, along with the result of various analyses. This would solve the “additional tooling” problem of typical parser generators, and also the “lack of conflict analysis” problem of typical parser combinator libraries. But it may require support for staging for performance reasons. Menhir takes a couple seconds to generate a parser from a big grammar, which is fine as a compile-time operation but I wouldn’t want to pay this cost every time I am starting an application containing the parser.

yallop · March 9, 2020, 9:11am

Readers of this thread may be interested in the asp (algebraic staged parsing) library (also described in the Transept post linked above), which is built on an approach along the lines @gasche describes:

combinators that describe a grammar (using context-free expressions)
an analysis (formulated as a type system) that ensures deterministic parsing
staging to eliminate performance overhead

The interface is pretty standard, with combinators for alternation, sequencing, etc., and performance is quite good (better than ocamlyacc on our benchmarks).

There’s a paper, A typed algebraic approach to parsing, that describes the design in more detail.

Grammars built using asp are essentially LL(1). (The weasel word “essentially” isn’t hiding much here, but the paper has the details.)

Topic		Replies	Views
My way from LALR parsing to combinator parsing Community	6	2745	March 12, 2021
[ANN] Transept 0.1.0: Generalised Parser Combinators Community announce	1	1392	March 5, 2020
Paper about fast parser combinators in OCaml? Ecosystem	1	836	March 31, 2022
OCaml Parser combinator library as powerful as fastparse(Scala)? Ecosystem	27	4233	November 25, 2022
[ANN] Fmlib 0.5.5 with emphasis on combinator parsing with lexers Ecosystem	2	372	September 26, 2023

Parser combinators vs. parser preprocessors?

Related topics