[ANN] Parseff: parser combinator library for OCaml 5

oh… that’s one combinator that I wouldn’t discard adding!

EDIT: Nevermind, the usage is parsera \#or parserb

Yes, don’t disagree, but If there’s combinator style worth adding it can be always added as a separate package or even in user’s code

Thanks, would you mind opening a PR? Happy to have a Angstron (optimised) version on the repo!

No, it’s parsera or parserb :slight_smile:

maybe ocamlformat issue? dunno.

That’s a cute suggestion, but I don’t want to add operators in parseff

Share it on https://x.com/davesnx/status/2036739000271007997
Let me expand here:

I’m exploring the idea of making parseff faster, autoresearch suggested to remove effects entirely from parseff and it makes fused operations equal, but primitive operations are much faster.

Kind of loses the purpose of the name tbh, but frankly, the comparison is:

46-75% faster without effects because each combinator call is now a direct function call + DLS.get (~5ns) instead of Effect.perform + handler dispatch + continuation resume (~50-100ns)

  1. This PR Bench angstrom: make fair and angstrom more comparable by reynir · Pull Request #6 · davesnx/parseff · GitHub changed a bit the parseff (fair) version and angstrom to be actually slower
  2. This commit Performance optimizations: +34% json_fused, +48% arith_generic throug… · davesnx/parseff@8d83a14 · GitHub creats an optimised version of anstrom and does micro optimisations on parseff (0.3.0)

So, parseff 0.4.0 will probably ship without effects and be faster than angstron in any case, but for fused operations it will be almost the same.

It looks like the alternatives parser will be much longer without specialized effect-based backtracking combinator. Is there any chance to make effects and reimplementation of recursive decent live together?

Also, if we want seriously talk about JSON parsing performance, we need to compare with GitHub - simdjson/simdjson: Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks · GitHub too

The public interface hasn’t changed. What do you mean, exactly?

We aren’t seriously comparing json parsing perf, we are comparing similar libraries.

Not really surprising but we should be able to do better than ~50-100ns for the effects version. Could you say which OCaml version you used and what hardware it was on?

Feel free to go deeper with this branch No effects version by davesnx · Pull Request #7 · davesnx/parseff · GitHub and running those benchmarks parseff/bench at 41cc5852b3cbc78e00710b4c6727211d476d7cee · davesnx/parseff · GitHub

OCaml 5.4.0

Hardware

  • CPU: 2x AMD EPYC 9755 128-Core Processor
  • Cores/threads: 256 physical cores, 512 logical CPUs total
  • RAM: 2.2 TiB total (771 GiB in use)

Dumb question: how does it compare to menhir in terms of performance?

Hard to say, because they aren’t comparable or at least I have no idea how to do it.

Mostly because menhir is a parser generator and it will be used inside a pipeline with a Lexer, while parseff (or any parser combinator) often is used to parse strings; you could do Source.of_function and have some lexer producing tokens but still they are solving different problems.

Any parser combinator has backtracking, and it builds up from small functions, conflicts don’t exist while menhir is more like a monolithic "table-driven shift/reduce on a stream of tokens) which doesn’t compose with the rest of your program and has an implicit error mechanism.

If you compare them somehow it will depend on the syntax as well, and in the menhir’s case you might need to ignore the lexing time? Anyway, no idea how to make justice here, hope it helps