[ANN] ocaml-protoc 3.0

Dear camlidae,

I’m happy to announce the release of ocaml-protoc 3.0 (alongside the multiple runtime libraries: pbrt, pbrt_yojson, and the new pbrt_services). This is a majorly breaking release, I’m sorry for that (I do, however, believe it necessary) and recommend every user of ocaml-protoc add an upper bound < 3.0 to their current project and migrate when they have time.

First, a summary. ocaml-protoc is a self-contained compiler that turns protobuf IDL files (.proto files) into OCaml types, pretty-printers, and (de)serialization functions. The runtime library pbrt (“protobuf runtime”) contains support code for printers and binary (de)serialization; pbrt_yojson contains support code for JSON (de)serialization by way of yojson.

So what changed with ocaml-protoc 3.0? Many things.

For a start, from one .proto file we now generate one pair of .ml and .mli files instead of several pairs. This reduces the boilerplate in build systems and simplifies user code overall (one module per .proto file). A large internal refactor of ocaml-protoc was done prior to the integration of… services.

The major new feature of ocaml-protoc 3.0 is the support for service declarations. These are essentially a way to describe RPC endpoints, next to the types used to interact with the endpoint (example; full generated code). This is typically what it used in systems such as gRPC. Now ocaml-protoc generates server and client stubs for each endpoint, that pack together the type definitions and the relevant (de)serializers; that code doesn’t presume anything about a concrete RPC system. I have in the works a simple Twirp OCaml library that relies on this generated code to provide services over HTTP 1.1; It is also possible to write RPC systems over ZMQ, websockets, etc. without changes to the generated code[1].

Another big-ish change is how the generated code looks like, at least when it comes to binary (de)serialization. ocaml-protoc 3.0
comes with significant speedups for encoding (up to twice the throughput; order of magnitude reduction in allocations in some cases [2]) and some less impressive speedups for decoding. This is a combination of multiple changes:

  • use of a few C stubs to accelerate varint decoding/encoding;
  • encoding is done back-to-front, which allows the encoder to use a single slice internally[^3]. This is what required changes in the generated code in the first place;
  • encoding code now requires a lot fewer closures (passing arguments explicitly instead) which reduces allocations to almost nothing.

I haven’t recently benchmarked against other protobuf implementations in OCaml, but I’m reasonably confident that this is now the fastest one by a healthy margin.

There are also other improvements and bugfixes. I want to thank in particular @Konstantin_Olkhovski for some of these contributions and for very helpful discussions, and also @VPhantom for more discussions on the topic of performance.

The changelog contains many more details.


  1. if the encoder type is reused, there’s almost no minor allocations, and no major allocations, to encode an existing value into the encoder’s buffer. ↩︎

  2. because sub-messages use varint as their sizes, encoding front-to-back cannot be efficiently done in a single buffer, because it’s not clear how many bytes to reserve in front of a sub-message. With back-to-front that’s not an issue. ↩︎

16 Likes

This kicks some serious behind. Thank you Simon! :tada:

2 Likes

This sounds fantastic @c-cube thanks for your work. Now to try it out with ocaml-grpc to see how it performs.

Can you comment on the feature comparison to ocaml-protoc-plugin at GitHub - issuu/ocaml-protoc-plugin: ocaml-protoc-plugin ?

Feature ocaml-protoc ocaml-protoc-plugin ocaml-pb
Ocaml types Supported Supported Defined runtime
Service endpoints Ignored Supported N/A
proto3 Supported Supported Supported
proto2 Supported Supported Supported
proto2 extends Ignored Supported Supported
proto2 groups Ignored Not supported ?

Is that comparison missing something you feel is important to consider when choosing between the two libraries?

Well, the thing that has changed is that services are now supported by ocaml-protoc. However it is quite possible that ocaml-protoc-plugin is more compliant with what Google’s protobuf does: in part because it’s a protoc plugin; in part because ocaml-protoc doesn’t really care for the details of what presence means for proto3. More specifically, fields that are not marked as optional are always written.

This looks really promising, especially the support for services! We have a very recent PR in explore API with decode/encode in the library by mbarbin · Pull Request #48 · dialohq/ocaml-grpc · GitHub to reduce boilerplate and make gRPC services more typesafe. The PR was initially aimed at ocaml-protoc-plugin but it is general enough that it could allow for very good integration with ocaml-protoc as well.

2 Likes