[ANN] reparse 2.0.0

A new version of reparse 2.0.0 has been released to opam.

Reparse is a monadic, recursive descent based, comprehensive, parser construction library for ocaml.

CHANGES for version 2.0.0:
  • Rewrite the whole package to use exceptions rather than result type
  • Adds many more parsing combinators
  • Adds comprehensive unit tests
  • Adds comprehensive documentation, host documentation and add links in repo home page
  • Adds abstraction for input source
  • Provides unix file source and string input source
  • Adds separate package reparse-unix for unix file input
  • Adds calc.ml and json.ml in examples.

Additionally, the API is now comprehensively documented with at least an example for each API call.

Enjoy.

B.

6 Likes

I’d like to know the rational behind this change. Would you have time to comment?

3 Likes

Ah, yes. It’s been a while since I made the decision so let me recollect a bit. I will most likely write a blog post and post it here. Stay tuned.

3 Likes

How does this compare with Angstrom, which is also a parser combinator library? It supports parsing binary formats but it works for text formats as well.

1 Like

Apologies, I haven’t used Angstrom yet to be able to comment on it.

I use angstrom extensively, so I’ll definitely be taking a look at reparse. It’s a pity you weren’t aware of angstrom prior to starting work on it; of course, being familiar with prior work can often usefully inform one’s own.

I wonder if this library is something you wanted to build and share regardless, or if you had unsuccessfully looked for something like it previously? Despite good work in the community, discovering appropriate libraries can still be a bit of a challenge sometimes compared to fuller ecosystems.

1 Like

The library organically grew mostly from my preference and experience of writing recursive descent parsers. Thus the name reparse.

My main motivation was “I like writing recursive descent parsers, how can I make the process of writing them productive and efficient”. The main pain point I initially wanted to address was to somehow abstract/encapsulate the parser state operations, e.g. get next char, is it eof, create lexer/parser buffer and so on. Such that if I wanted to implement a parser for one of the HTTP RFCs, I wouldn’t have to worry about parser state management every time I start work on a new parser.

It was doable and easy enough that I went ahead and did it. v0.0.1(not released) was just that. I used that in a few of my parsers and that worked quite well. From then onward it started taking shape of its own.

As an aside, v1.0.0 was using result type underneath as the central data type. However, I became quite disillusioned with the development experience of it so I removed it in v2.0.0 and used the venerable exception type.

Hope this helps!

1 Like

Exceptions vs. results in a library isn’t a massive hurdle, though I personally tend to catch and wrap exceptions as close to their sources as possible (and use exceptions in my own code mostly only for non-local jumps/returns). I think that is mostly orthodoxy these days, though of course tastes vary.

Thanks very much for the background in general!

@rixed My initial - first draft - thoughts on exceptions vs result. https://lemaetech.co.uk/articles/exceptions.html

4 Likes

Thank you for having written this down. I came to the same conclusion for similar reasons, plus I really much prefer when the code is not splattered with error passing code (esp. when dealing with several libraries that use different conventions, as you noted).
To me OCaml syntax is lean and light with exceptions but using monadic error handling lower the signal to noise ratio of the code significantly.
If only we had a way to know statically which exceptions can escape any functions that would be the best of both worlds!

5 Likes

Whoops, just found https://lemaetech.co.uk/articles/exceptions.html ignore the below question

Do you have any more details on this? I have been using result to exceptions for years and find it mostly-superior to exceptions and am curious what your experience was.

Thank you very much for your writeup! It ended up spurring me on to some further reading and conclusions that I formed into a separate post here:

1 Like

For what it’s worth, I remember making a similar choice to use the exn type for signaling errors when I was designing the Orsetto recursive descent parser library. Mostly for the same reasons as you, but also because it facilitated composition of error recovery.

For example, Orsetto makes this function available to parsers:

    (** The error check scanner. Use [ck p] to create a new scanner that either
        produces either [Ok v] if [p] produces [v] or [Error x] if scanning the
        input with [p] raises an exception [x].
    *)
    val ck: 'r t -> ('r, exn) result t

p.s. I was aware of Angstrom when it was introduced. The origins of Orsetto preceded it. I’ve been supporting Orsetto for a very very long time. I am an old person.

1 Like

Also, I have a hunch that JSON Test Suite won’t like that example JSON parser very much. (Don’t feel bad: it hates everyone.)

2 Likes

I just finished updating all the copyright notices for 2020, and I’m somewhat knocked on my heels by the fact that some of these files are now old enough to be eligible for the military draft. And when I started messing around with OCaml, the compiler group had already end-of-lifed three previous major versions of the tool chain.

All this is to say— I’m taking this moment to reflect and appreciate the maturity of the OCaml language ecosystem.

1 Like

On a related note, see Roberto Di Cosmo’s experience with resurrecting OCaml code from 1998. Finding the sources was difficult, but the OCaml 1.07 sources compiled fine with OCaml 4.05, 20 years later.

10 Likes