Orsetto: assorted structured data interchange languages



I have made available one of my personal side projects, Orsetto, for community review. My plan is for this library is for it to be an assortment of structured data interchange languages organized around a common foundation of various general purpose utilities. At the moment, only JSON is implemented, and I’m planning to implement CBOR next.

Much of the foundation layer in Orsetto originated in my now outdated Oni project. That project also included some experimental work in concurrent network I/O, which I’m hoping to revisit in the context of the algebraic effects feature in development on the OCaml multicore project. In the meantime, Orsetto is generally useful in OCaml 4.06.1 and later.

I haven’t posted a package on OPAM yet. I’m waiting for the OPAM 2.0 migration. If you’re interested in playing with it now, then you’ll need to be using OPAM 2.0, then clone the repository and install it with a pinned local source tree.


Recent commits have included work in progress on the Unicode regular expression implementation. I’m still a way off from claiming RL1 conformance, but I can see the light at the end of that tunnel.

I’m also giving some thought to bringing over the MIME implementation from the old Oni side branch. Or not. Maybe MIME will go away if we clap hard enough and sing real loud.


Recent commits now include a flyweight implementation of Concise Binary Object Representation (CBOR) [RFC 7049] using the Cf_encode and Cf_decode interfaces.

Also, I’ve made more progress on Unicode regular expressions. The remaining pieces of RL1 conformance in Ucs_regx seem less pressing, because need for regular expressions in Orsetto is somewhat limited to being an extension of the more general Cf_scan and Ucs_scan parsers. I plan to implement Unicode line and word boundary matching as scanners first, then as regular expression forms later.


Over my summer vacation, I found the time to implement The Base16, Base32, and Base64 Data Encodings [RFC 4648]. I’ve also made some minor improvements to the Cf_encode and Cf_decode modules.


I refactored the old Nx_uri module from Oni into the new Cf_uri module in Orsetto and it’s passing all the unit tests. Some differences worth noting between my implementation and the URI package from the Mirage project:

  1. The Mirage implementation uses the Re package on OPAM, and the Orsetto implementation uses its own Cf_regx module for constructing the URI component scanners. The lazy DFA engine for Cf_regx is the same one used by the Ucs_regx module that I mentioned previously in this thread.
  2. The Mirage implementation includes logic for parsing key/value parameters in query parts, and the Orsetto implementation does not. (Providing delimiter-separated value structures is a separate task. See Issue #15 for details.)
  3. The Orsetto implementation does not implement structural equality and comparison on the decomposed structures. (I don’t see the point. String equality and comparison on the composed text should suffice.)
  4. The Orsetto implmentation uses syntax-based normalization, not protocol-based normalization. This is because the Cf_uri module comprises only the generic syntax for URI and URI references with the generic path resolving logic. It doesn’t comprehend any of the protocol-specific extensions of the URI syntax. According to my design, those protocol-based normalization rules for URI should be implemented by extending the generic syntax-based normalization provided in the Cf_uri module.

As mentioned above, anyone interested in tracking my progress is invited to follow the issue reports in the Bitbucket repository.