In a previous post I have described my way from LALR parsing to combinator
parsing. Now I am more and more convinced that combinator parsing is really a
good and flexible way to write parsers. The new release 0.5.0 of
on layout parsing and nicely formatted error messages by using combinator
Most programming languages express hierarchical structures by some kind of
parentheses. Algol like languages use
end, C like languages use curly
} to enclose blocks of code. Since blocks can be nested inside
blocks, the hierarchical or tree structure is well expressed by the syntax.
For the human reader blocks are usually indented to make the hierarchical
structure graphically visible. Programming languages like Haskell and
Python ommit the parentheses and express the hierarchical structure by
indentation. I.e. the indentation is part of the grammar. This is pleasing to
the eye, because many parentheses can be ommitted.
The hierarchical structure in the following schematical source file is
immediately visible without the need of parentheses.
xxxxxxxxxxx xxx xxx xxxxxxx xxxxxxxx xxx
Lower level blocks are indented with respect to their parent block and siblings
at the same level are vertically aligned.
Because of this good readability configuration languages like yaml have
become very popular.
Unfortunately there are not many parsers available which support indentation
sensitivity. The library Fmlib has support to parse languages whose grammar uses indentation to structure blocks hierarchically.
There are only 3 combinators needed to introduce layout parsing in combinator
parsing. Suppose that
p is a combinator parsing a certain contruct. Then we
indent 4 p: Parse the construct described by
pindented at least 4
columns relative to its environment
align p: Parse the construct desribed by
paligned vertically with its
detach p: Parse the construct described by
pwithout any indentation or
In order to parse a list of
ps vertically aligned and indented relative to its
environment by at least one column we just write
one_or_more (align p) |> indent 1
and parse a structure with the schematic layout
xxxxxxxx pppppppp pppppp pppp xxxxx
It is important to for a parser writer to make syntax error messages user
friendly. Fmlib has some support to write friendly error messages. There is the operator
<?> copied from the Haskell library
parsec which helps to equip combinators with descriptive error message in case they fail to parse the construct successfully.
At the end of a failed parsing, the syntax (or semantic) errors have to be
presented to the user. Suppose there is a combinator parser for a yaml like
structure. The library writes by default for you error messages in the form
1 | 2 | names: 3 | - Alice 3 | - Bob 4 | 5 | category: encryption ^ I have encountered something unexpected. I was expecting one of - at 3 columns after - sequence element: "- <yaml value>" - at 2 columns before - key value pair: "<key>: <yaml value>" - end of input
The raw information (line and column numbers, individual expectations, failed
indentation or alignment expectation) is available as well so that you can
present the error messages to the user in any different form.
There is also a component Fmlib_pretty in the library for pretty printing any ascii text.