In a previous post I have described my way from LALR parsing to combinator
parsing. Now I am more and more convinced that combinator parsing is really a
good and flexible way to write parsers. The new release 0.5.0 of Fmlib
focuses
on layout parsing and nicely formatted error messages by using combinator
parsing.
The library can be installed via opam by opam install fmlib
. There is a github repository hosting the source code. The API can be found online. See also
a tutorial on combinator parsing.
Layout Parsing
Most programming languages express hierarchical structures by some kind of
parentheses. Algol like languages use begin
end
, C like languages use curly
braces {
, }
to enclose blocks of code. Since blocks can be nested inside
blocks, the hierarchical or tree structure is well expressed by the syntax.
For the human reader blocks are usually indented to make the hierarchical
structure graphically visible. Programming languages like Haskell and
Python ommit the parentheses and express the hierarchical structure by
indentation. I.e. the indentation is part of the grammar. This is pleasing to
the eye, because many parentheses can be ommitted.
The hierarchical structure in the following schematical source file is
immediately visible without the need of parentheses.
xxxxxxxxxxx
xxx
xxx
xxxxxxx
xxxxxxxx
xxx
Lower level blocks are indented with respect to their parent block and siblings
at the same level are vertically aligned.
Because of this good readability configuration languages like yaml have
become very popular.
Unfortunately there are not many parsers available which support indentation
sensitivity. The library Fmlib has support to parse languages whose grammar uses indentation to structure blocks hierarchically.
There are only 3 combinators needed to introduce layout parsing in combinator
parsing. Suppose that p
is a combinator parsing a certain contruct. Then we
have
-
indent 4 p
: Parse the construct described byp
indented at least 4
columns relative to its environment -
align p
: Parse the construct desribed byp
aligned vertically with its
siblings -
detach p
: Parse the construct described byp
without any indentation or
alignment restrictions
In order to parse a list of p
s vertically aligned and indented relative to its
environment by at least one column we just write
one_or_more (align p) |> indent 1
and parse a structure with the schematic layout
xxxxxxxx
pppppppp
pppppp
pppp
xxxxx
User Frienly Error Messages
It is important to for a parser writer to make syntax error messages user
friendly. Fmlib has some support to write friendly error messages. There is the operator <?>
copied from the Haskell library parsec
which helps to equip combinators with descriptive error message in case they fail to parse the construct successfully.
At the end of a failed parsing, the syntax (or semantic) errors have to be
presented to the user. Suppose there is a combinator parser for a yaml like
structure. The library writes by default for you error messages in the form
1 |
2 | names:
3 | - Alice
3 | - Bob
4 |
5 | category: encryption
^
I have encountered something unexpected. I was
expecting one of
- at 3 columns after
- sequence element: "- <yaml value>"
- at 2 columns before
- key value pair: "<key>: <yaml value>"
- end of input
The raw information (line and column numbers, individual expectations, failed
indentation or alignment expectation) is available as well so that you can
present the error messages to the user in any different form.
There is also a component Fmlib_pretty in the library for pretty printing any ascii text.