[ANN] Orsetto: structured data interchange languages (release 1.1)

jhw · June 12, 2021, 7:49pm

Announcing the release to OPAM of Orsetto 1.1, an update to a personal project of mine not sponsored by my employer. Licensed with BSD 2-Clause.

Q. What is Orsetto?

Aspires to do eventually for OCaml more or less what Serde has done for Rust, i.e. to be a curated and self-contained collection of structured data interchange languages with a cohesive and unified model of serialization and deserialization.

Two interchange languages are currently supported: JSON and CBOR.

Q. What is new in this release?

The change log for the release is here: CHANGES.md

Drops support for OCaml < 4.08.
Major improvements in test coverage.
Many corrections for logic errors.
Issues database migrated to Atlassian Jira Cloud.
A few important usability improvements (see below).

Some usability improvements are meant to replace obsolescent interfaces (now marked obsolescent with @ocaml.deprecated). A few improvements to some interfaces with especially suboptimal design were updated with breaking changes, but these are not expected to cause major upgrade problems for anyone.

Some things have not changed:

Still requires ocamlfind and does not build with dune.
Still only supports JSON and CBOR.

Q. It looks incomplete. What are your plans for future development?

Yes, it’s a personal project. I’d welcome opportunities to collaborate with others who share my vision for the project. As long as it’s just me working on this, development will be somewhat slow. I have a lot of projects. This is the only open source one.

Orsetto 1.0.3 is the previous release. It offered parsers and emitter combinators for JSON and CBOR for OCaml >= 4.06.1 (including 4.13~alpha1). The quality of its JSON support is adequate, and it scores well on the nst/JSONTestSuite tests. The quality of its CBOR support is provisional, and not recommended.
Orsetto 1.1 is the current release. It adds generalized and extensible structured data interchange models with specializations for producing emitters and parsers for JSON and CBOR. The quality of the CBOR support is much improved.
Orsetto 1.2 is the next planned release. It will drop interfaces marked @caml.deprecated in the 1.1 release. It will also drop support for OCaml < 4.10, and I’d like to stop depending on ocamlfind. I hope to add a PPX for deriving parsers and emitters from OCaml data type definitions. I might also consider one or more new interchange languages— suggestions are heartily encouraged.

dakk · June 13, 2021, 8:28am

Good job; for 1.2 I suggest you x-www-form-urlencoded or yaml as new interchange languages.

PS: Do you know what “orsetto” means in italian?

jhw · June 13, 2021, 4:35pm

Thanks for suggesting x-www-form-urlencoded. Orsetto already has a URI parser/emitter (see here) with support for percent-encoding, but it needs this enhancement for key-value pairs. I’ll file a task request for it. (Update: it’s ORS-95.)

Yes! Italian isn’t my native language, but I’ve been studying it for about seven years now, and I can get around in it okay. Pretty much all partner’s family in Rome speaks no English, so I’ve had to learn Italian to keep up around the dinner table when I visit.

joelb · June 17, 2021, 1:18am

Hi jhw, nice looking project! I have two questions: (1) Are there any examples available? (2) Any guidance on when I should use Orsetto compared to the existing cbor package?

jhw · June 17, 2021, 1:56am

Hey, thanks for taking a look!

p1. Until the PPX layer is ready for publishing, any but the most simplistic examples would be more gnarly than I really want. I’d like for the initial set of examples to look like the ones on the front page of the Serde project, i.e. with PPX derivers for data ingestors and renderers. Maybe next quarter when I plan to release a 1.1.1 patch update, there will be a folder with some simple example programs.

p2. One noteworthy observation I would make about the other cbor package on OPAM right now: it’s more or less the same as my Cbor_flyweight module, i.e. it comprises a sum type that represents any well-formed CBOR value and some functions for encoding to a string and decoding from a string. When you use an interface like this, you still have to write the translator between your application data types and the distinguished CBOR value type. The model used by Orsetto is more like Serde, in that you parse and emit directly to and from your application data types.

To summarize, this release is still harder to teach than I would like it to be, and I expect that to remain a problem until at least the 1.2 release. Part of the issue there is that a wide variety of structured data modeling languages are available, and their usage scenarios are widely disparate, e.g. JSON Schema, CDDL, G*Protobufs, Thrift, ASN.1, Yang, et cetera. Some are used for compiling IDL specifications into application data structures tightly coupled to their encoding rather than their function in the application. Some are used for writing schema validators and/or protocol technical specifications, and they are not intended to describe the structure of any application data. Orsetto has no opinion on which of these is best. It doesn’t implement any particular modeling languages, and it probably never will have them.

Nevertheless, describing the conversion of application data types to and from a generalized data model suitable for use with JSON, CBOR and other interchange languages, is gnarly. In Orsetto 1.0, it entails writing custom parsers and emitters for each combination of an application data type and its representation in each interchange language. In Orsetto 1.1, you still have to compose a gnarly structure representing the schema for encoding and decoding your application data type, but at least there are straightforward functions for compiling those into parsers and emitters for each interchange language you want to use. In Orsetto 1.2, I hope to make it substantially easier to specify data schema with PPX decorations of your application data. At that point, it should be as easy to learn to use as Serde.

joelb · June 17, 2021, 2:44am

Thanks @jhw. I agree that Serde-like usability is a good goal. Looking forward to 1.2 to try it out!

anentropic · June 18, 2021, 11:25am

The README would benefit from some examples to help illustrate what it does and how it is used

Topic		Replies	Views
[ANN] Orsetto: structured data interchange languages (release 1.1.2) Community announce , unicode , datastructures , json , cbor	2	798	October 17, 2022
[ANN] Orsetto: structured data interchange languages (version 1.0) Community announce , serialization , json , cbor , orsetto	18	2053	July 17, 2019
Orsetto: assorted structured data interchange languages Community announce	6	1793	October 25, 2018
[ANN] Orsetto: structured data interchange languages (preview release) Community announce , serialization , json , cbor , orsetto	12	1883	April 22, 2019
[ANN] Orsetto: structured data interchange languages (preview) Community announce	1	993	December 3, 2018

[ANN] Orsetto: structured data interchange languages (release 1.1)

Related topics