[ANN] Protocell 1.0.0: Yet another plugin for the Protobuf Compiler

Hello everyone!

I’ve recently released the first version of Protocell which offers yet another way of generating OCaml code from .proto files.

Feature highlights:

:camel: Full support for all proto3 primitive and user-defined types.
:camel: Supports imports and generates one .ml file per .proto file.
:camel: Automagically supplies code for imports of Google’s “well-known types” when needed.
:camel: Concise yet comprehensive test framework that cross-checks serializations with those of protoc itself.
:camel: Fully bootstrapped: Protocell uses Protocell-generated code to interact with protoc.
:camel: Lean on dependencies, especially when it comes to the runtime library.
:camel: Supports OCaml compiler versions 4.04.1 and above.
:camel: Decent proto2 support.
:camel: Can generate and parse of protoc's text format (mostly for testing and debugging purposes).

More information and example code can be found at the project’s homepage.

I’m still just a newbie when it comes to OCaml, its tooling and ecosystem. I warmly welcome any sort of input or feedback.

7 Likes

Adding a trivial example to README would be really helpful to quickly compare with other implementations.

2 Likes

Thanks, that makes a lot of sense. I just added an example to the README.

2 Likes

Nice! I hear tell that the protobuf text-mode format is used at certain companies as the format of config-files, so that nobody ever has to write a config-file parser again, and they get a certain level of version-to-version compatibility built-in.

Lovely! I used to use JSON for config-files, now I can switch to text-mode protobufs for my next project!

1 Like

It’s a trap. Pretty soon you find yourself generating configuration files from protobuffer messages by automation, and you quickly discover that writing messages to text-mode can lose precision and alter text, depending on the language implementation and other factors.

How interesting. I would have said that

Pretty soon you find yourself generating configuration files from protobuffer messages by automation,

was a great thing, no? I did it in my blockchain project (generated JSON) and found it really valuable. So instead of parsing an INI file into an INI object, and then having to either write interface code to access the INI object, or write a convert to convert that INI object into the actual Ocaml object that was used to drive configuration, the marshaller demarshalled right into the Ocaml object directly. Much better, and much more fool-proof.

It seems like you’re really pointing out that the langage-specific bindings for some particular wire-format might not be adequate (not the identity function) and hence, that wire-format might not be a good config-file format? But shouldn’t this critique apply to all config-file-parser libraries? They can all suffer from this problem, right? And in the case of config-file-parser libraries, they’re neither used as heavily as an RPC/marshaller system, nor are they as fully-specified and thought-thru. Both of these would seem to argue for using an existing marshaller-library, if possible.

As I noted above, I did this already using JSON (and the lovely ppx_json de/serializers) and it was great! The only problem is that the particular way in which they generate JSON, doesn’t correspond exactly to the way that other languages’ JSON libraries want to parse/generate JSON. In short, JSON isn’t a first-class wire-format for data-transmission.

I would have thought that protobufs remedied this problem, or more precisely, that a protobuf language binding for language X (e.g. ocaml) that did not guarantee identity and deal with all the aspects of the wire-format, was simply not a valid/correct/conforming binding?

Is there something I’m missing?

That’s very interesting! I never thought the text format would be useful for anything but testing and perhaps debugging.

In its current state it should function fine but I can think of these two caveats:

  1. The serialization code puts everything on a single line and there’s no way to customize that.
  2. The deserialization code could be a lot faster. Right now it re-tokenizes each nested message as it goes which is quite wasteful.

I’m not sure these are issues for you but I can imagine they might be. Do let me know if there’s an optimization or feature you need (or just open an issue in the issue tracker).