[ANN] Orsetto: structured data interchange languages (release 1.1.2)

Announcing the release to OPAM of Orsetto 1.1.2, an update to a personal project of mine not sponsored by my employer. Licensed with BSD 2-Clause.

Q. What is Orsetto?

Aspires to do eventually for OCaml more or less what Serde has done for Rust, i.e. to be a curated and self-contained collection of structured data interchange languages with a cohesive and unified model of serialization and deserialization.

Two interchange languages are currently supported: JSON and CBOR .

Q. What is new in this release?

Mostly error corrections, particularly in the CBOR library, produced by improving test coverage. The change log for the release is here: CHANGES.md

Highlights:

  • Major improvements in test coverage.
  • Many corrections for logic errors.
  • A few minor usability improvements.

Some things have not changed:

  • Still has no Programmer Guide or Tutorial, or really any introduction at all.
  • Still requires ocamlfind and builds with omake, which is currently not compatible with OCaml 5.0.
  • Still only supports JSON and CBOR.

Q. It looks incomplete. What are your plans for future development?

Yes, it’s a personal project, and yes, I’m aware there are no reverse dependencies on it currently in the public OPAM repository. Still, I’d welcome opportunities to collaborate with others who share my vision for the project. As long as it’s just me working on this, development will continue to be somewhat slow, as I’m prone to over-engineer things I care about. I have a lot of projects, and this is the only open source one.

  • Orsetto 1.1.2 is the current release. It adds generalized and extensible structured data interchange models with specializations for producing emitters and parsers for JSON and CBOR. The quality of the CBOR support is much improved, and I’m using it with good results in other projects. Supported on OCaml >= 4.08.

  • Orsetto 1.2 is the next planned release. It will drop interfaces marked @caml.deprecated in the 1.1 release. It will also drop support for OCaml < 4.10, and it will stop depending on ocamlfind. I hope to add a PPX for deriving parsers and emitters from OCaml data type definitions. I might also consider one or more new interchange languages— suggestions are heartily encouraged.

p.s. This release has been ready for months on my development branch and it was delayed by a constellation of interdependent obstacles to performing the release ritual that required I set aside a considerable block of personal time to address them all at once. My apologies for that. I’ve finally got everything repaired, and further error correction updates should be released on a more timely basis going forward.

5 Likes

What is the difference with ATD?
Just curious.

Orsetto and ATD differ at the level of basic operating philosophy.

If you use a system like ATD, then you define messages with a data modeling language, then 1) use a tool that generates program source code for the application data structures that represent those messages, and 2) use an associated library that can encode and decode those messages in one or more interchange languages. (There are many competing systems like this, e.g. Google Protocol Buffers, Corba DDS, ASN.1, et cetera, and each of them define their own tools for generating program source code from message schema definitions.)

If you use a system like Orsetto, then you define your application data structures directly in your program source, without a separate data modeling language, and you use a library for describing how to encode and decode them directly using various interchange languages. (In the OCaml ecosystem, the most popular way of doing this today is with ppx_deriving extensions.)

The difference in these two philosophies becomes most apparent in how they each approach problems of protocol evolution.

With a data modeling language, you can modify an existing message definition to add new fields and choices, deprecate (not remove) obsolete fields and choices, and sometimes (depending on how rich the modeling language) change other validation constraints. When you make changes in a message definition, then the module signature of the generated program source is changed to reflect them, and application source code that manipulates data represented in those messages may have to change to accommodate the changes in the module signature.

In a system where messages are directly modeled by programming language data type specifications, evolving a protocol to include updates to the data model entails changing serialization and deserialization functions corresponding to the application data types representing the messages. Where those functions are derived systematically from the application data type specification, the data model of the protocol tends to be more tightly coupled to the programming language in which it is implemented.

What sets Orsetto apart from other systems that are agnostic about concrete data modeling languages is that it contains an internal abstract data modeling language implemented as an OCaml library (see the Cf_data_render and Cf_data_ingest modules). While it’s a bit unwieldy to use directly at the moment, it’s really intended for use in implementations of concrete data modeling languages e.g. ASN.1, CDR, G-Protobuf, YANG, CDDL, et cetera.

I do hope eventually to write a PPX extension at some point (when PPX is a more stable platform) that derives Orsetto data models for most ordinary type specifications. At that point, it should be a reasonably straightforward process to define a new interchange language in terms of its Orsetto abstract data model, then to add support for it to the PPX via a plug-in architecture.

2 Likes