gRPC server and client in OCaml

EDIT I hacked together a working but not complete OCaml gRPC implementation using the ocaml-h2 library.

TL;DR https://github.com/blandinw/ocaml-grpc-envoy/

Hey, I’m new to OCaml after writing some Clojure, C++ and Haskell in various contexts, including working at FB (relevant below).

After browsing this forum and Reddit for a bit, the assumption seems to be that OCaml is not a good fit for gRPC, since there’s no pure implementation today. Now, this is something I have experience with, so I thought I’d try and challenge this assumption.

As you may know, services inside FB use Thrift (both the format and protocol) to communicate. The Thrift team worked primarily in C++ (for good reasons), causing support for other languages to lag behind despite their best efforts. Now, the interchange format (equivalent to Protobuf) does not change very often so it’s fine to have a per-language implementation, but the client and server (equivalent to HTTP2 + gRPC) frequently receive new features, optimizations and fixes. After a valiant and continued effort to support most languages used internally, the Thrift team came up with an idea. Instead of maintaining multiple implementations and dealing with obscure FFI bugs, FingerprintTrustManagerFactorys and whatnot, they would focus solely on the C++ implementation and provide a daemon to be ran alongside whatever code you were trying to run. You could then use simple IPC to exchange Thrift (the format) messages with that daemon, and it would handle all the nitty-gritty of running a service at scale (load balancing, connection pooling, service discovery, security, retries, timeouts, network stats, hot restarts, etc.). Needless to say, it worked remarkably well even at very high scale and everybody was much happier.

I wanted to replicate this idea with OCaml and gRPC. We already have support for protobuf thanks to the excellent ocaml-protoc. All we need is a way to exchange protobuf messages reliably on the wire. Instead of having an OCaml implementation that will have to stay up-to-date and have its own set of bugs (the official grpc/grpc-java repo has 4450 commits and 2400 issues at the moment), can we reuse existing infra with already massive support and production time?

Fortunately, the people at Lyft built just that, open-sourced it and contributed it to the Cloud Native Computing Foundation in late 2017. It is called Envoy and it is bliss.

I demonstrate how to fit these pieces together at blandinw/ocaml-grpc-envoy to build a simple KV store, including a gRPC client and server in 200 lines of OCaml code. The idea is to spawn an Envoy process that will handle all gRPC communication for our OCaml code. We use HTTP/1.1 to exchange Protobuf messages with it, using for example httpaf and Lwt. This solution has the added benefit that it is highly scalable from the start, allowing you for instance to spawn one OCaml process per core and load balance between them. You can also use Envoy (with proper config!) as your web reverse proxy instead of say, nginx.

At the very least, this solution allows us to start writing gRPC code today, and gracefully evolve towards HTTP/2, Multicore and maybe a native OCaml implementation later.

I’m curious to hear your perspective on the future of building services with OCaml, or your past experience like what went well, what was missing, etc.

16 Likes

Fantastic idea. So if I understand correctly, the only thing that Envoy (server-side) is doing is translating the Protobuf from gRPC HTTP2 transport to HTTP1, and forwarding these Protobuf objects over HTTP1 to the OCaml server? Envoy doesn’t know to know about the actual gRPC schema, because it doesn’t touch the Protobuf objects themselves, right?

1 Like

This is an excellent idea; some thoughts.

(0) I’m curious about your experience with FBThrift. As I’m sure you’re well-aware, it diverged significantly from Apache Thrift. I built it a few years ago (after they OSSed it a second time) and boy howdy it was incredibly involved to first get all the prerequisite modules, and then get them at the required version/commit IDs, and then to build the thing. I eventually built it, and it worked, but wow.

Also, FBThrift … they don’t seem to ever have met a use of C++ tempaltes and specifically template metaprogramming, that made them say “no, this far and no further!” I was shocked at how clever the code was. Just … so much use of cleverness.

(1) First, this is a great idea, and for the vast majority of uses of RPC that don’t need high-performance, this is great. [by high-performance, I mean overhead measured in microseconds; in such situations, even context-switching matters, not to speak of extra copies]

(2) Since Envoy is in C++, it could eventually be turned into a DLL, which could be run in-process with memory-transports back-and-forth to the host language. This would reduce the performance overhead.

(3) It’s a pity that you’re using HTTP to talk to Envoy – is it not possible to use Thrift or some other efficient wireline?

1 Like

That’s correct. Envoy is only concerned with transporting bytes (along with load balancing, routing, etc, etc). Only OCaml knows about the Protobuf schemas.

In the OCaml server case, Envoy listens for HTTP/2 gRPC requests, accesses the bytes payload with no knowledge of the actual schema/layout and repackages these same bytes in a HTTP/1.1 request that OCaml can process. OCaml then responds with bytes (an encoded Protobuf response message) that Envoy sends back on the original HTTP2 connection.

1 Like

interesting read, thanks! :slight_smile:

1 Like

(0) Well, ironically enough, I’m not too familiar with Apache Thrift so it’s hard to comment on the divergence between the two.

Regarding building and using FBThrift, it is a much different story internally, since there is a monorepo with all dependencies vendored and a clean dependency graph. You literally just add one line of dependency to your project and go get a coffee (or two).

However, I’m not surprised by what you’re saying, and this has been the duality of FB opensource. On the one hand, they put out very high quality software (Proxygen/Wangle, LogDevice, wdt, buck, etc.) but on the other hand, historically they didn’t seem to care nearly as much about documentation and developer friendliness. That being said, I just checked the few repos I mentioned and FBThrift, and the situation seems to have improved a lot, even including an “Easy builds for Facebook projects” directory that uses Docker to provide a standard build env.

Regarding cleverness… yeah, I guess after years of eye parsing C++, your tolerance for complexity is not one of a mere human. From my experience, the fresh out-of-school engineers were the “cleverest”. C++ folks tended to be more mature pragmatic infra people and “feedback being a gift”, they kept the younglings in check while producing the highest quality codebases. I’ve seen unspeakable things in machine learning Python metaprogramming code though (not PyTorch), and it makes me cherish OCaml’s explicitness.

(1+2+3) Agreed. I’d like to say that there is a difference between high performance and high scalability. I’d argue that this is a good step towards a highly scalable architecture, but we have to trade some performance for it (probably tiny as you say). Unfortunately, HTTP/1.1 is the only way (other than HTTP/2) to communicate with Envoy today as far as I know. Note that the payload is still a binary black box, and Envoy only parses the HTTP headers to route the request.

Hopefully, if one cares about microseconds, they know enough C++ to implement a more efficient way to interact with Envoy.

I like the DLL idea. It would also allow us to package everything in a single library and even reuse Envoy’s HTTP2 code, getting rid of HTTP/1.1 entirely. The OCaml logic may also be embeddable in Envoy itself as a “filter”.

1 Like

Are there coding standards in FB, like there are at Google? The Google C++ style guide (last I checked, a few years ago), for example, pretty much forbids C++ template metaprogramming. I’ve done a bit of it myself, and it’s incredibly powerful stuff.

As an example, I looked at the JSON wireline for FBThrift, and hooboy, it was pretty damn difficult for me to figure out what it was doin’ there. Eventually I gave up, and just built/ran it, and observed that, yeap, they actually emitted correct JSON on the wire (unlike Apache Thrift’s JSON encoding, which might be better described as “text, with curly-braces”).

1 Like

There are definitely guidelines (and linters, static analysis, etc) on how to write C++ at FB, but this kind of things were usually decided at the team, or organization, level. A bit like federal law and state law in the U.S. That’s why you see so many different languages at FB (our team used Clojure and Haskell among other things, FB Chat was originally in Erlang, etc.). That being said, you’re incentivized to follow the general guidelines, because that’s how you get the best support from infra teams. Also, the more important the code, the more scrutiny it is put under, and you know security folks don’t like unreadable code.

Regarding templates in C++, it is often the best way to do something efficiently. I’ll use templates over virtual methods any day of the week. It can be very elegant and simple™️ if used well. It can be a vision of despair if left unchecked. I couldn’t find the JSON code you’re talking about so they may have rewritten it, this seems clean enough, for C++ of course.

I laughed at text with curlies :joy:

(1) Yeah, I agree with you about using templates – it’s an excellent thing. I meant (and the Google C++ Style Guide) specifically calls out “template metaprogramming”. That is, the use of template matching as a sort of “logic-programming” to select which template will be applied. That stuff is … well, it’s crack cocaine. I love it, and have used it to excellent effect, but hooboy I wouldn’t want to be the guy who comes across some code written that way, that I gotta debug/maintain.

(2) Re: FB guidelines. Fascinating. So there’s no company-wide coding standards. I remember that that was one of the first things that drilled-into new hires (on day two): “you will write code to the style guide, and that is because any sufficiently experienced engineer in the company should be able to pick up the code and understand it without too much effort”. It was a big, big deal.

(3) Re: JSON protocol, it wasn’t the header files, but rather the generated code, that had templates all over it. Concretely, in order to serialize (say) “list” or “map<string,T>” (etc), they used template metaprogramming to generate all the various iteration bits.

1 Like

(1-3) Ha, I misread that. Definitely something to be wary of, and I’ve had my share of “well, it seems to work but it’s complex enough that I’ll look for something else” moments. One example was folly::gen. It would definitely be frowned upon by most people. Generated code is fair game though :slight_smile:

(2) To be clear, I meant that there are some guidelines at the company level, but ultimately, you can’t stop a team from pushing crappy code that only they own, so the teams are responsible for their code style and tech choices. If you dare push code to a codebase that is used by many people however, you can bet that the review-bot will automatically add 20 people (among them 10 blocking reviewers) to your pull request and someone will let you know how they feel about your fancy metaprogramming.

It’s very interesting to hear about the onboarding at Goog. It probably helped a lot with ramping up on new codebases, which could be an issue at FB. However, I’ve never felt limited or forced to do anything (except security and privacy stuff that were unconditionally enforced of course) because of some company-wide guideline. Now that I think of it, a lot was communicated through “peer pressure”. There even was an internal group where the worst pieces of code were posted (screenshots with no names attached) to make fun of anti-patterns and educate engineers at the same time.

Heh, yeah, the “developer education” the first week was … comprehensive. I still remember the first morning, the guy says:

“OK, everybody here who’s starting in Android, raise your hands”
“Alright, everybody else (and also for the Android folks, but esp. everybody else), I got news for you: you are your own test team. There is no QA, except what you put into your own tests. And if you break the build, if the build-bot sends you a build-break, you have FIVE MINUTES to fix it and get the build working again”.

Needless to say, there were comprehensive tests everywhere that typically guaranteed (assuming that devs ran/passed them) that the build didn’t break.

But they did an excellent job of scaring the bejesus out of us, to ensure that we all did crazy testing. And it really worked, gotta say. The coding style guides were great, too. I was able to wander over basically any code in the system, and understand it (at the level of -systems-, not machine learning, or compiler algorithms, of course).

1 Like

I know of one project at Facebook that was originally written in Clojure and then converted to Haskell. Did you happen to work on NLP stuff by any chance :smile:?

1 Like

One more question … in the OCaml server code you handle the following endpoints:

  • GET *
  • POST /kv.KV/Get
  • POST /kv.KV/Set

How do the latter two get mapped from gRPC calls to HTTP requests? Does Envoy have some built-in rules for this mapping? Or is it something you defined?

1 Like

There is a pure h2 implementation in OCaml: https://github.com/anmonteiro/ocaml-h2
I think what’s missing for gRPC is streaming headers or something like that, @anmonteiro can probably fill in more precisely.

But could we use the h2 implementation as the OCaml backend and get slightly better performance out of this stack?
In any case this is super interesting stuff!

2 Likes

Ha. You got me :duck:. Hopefully, you’re a happy user of this project :slight_smile:

1 Like

This mapping is part of the protocol description.

Path → “:path” “/” Service-Name “/” {method name} # But see note below.

While Envoy could technically rewrite any part of the request, including the path, it tries to keep it as close as possible to the original HTTP/2 request.

Hey, I actually realized that this morning! I had somehow missed it when reading about services in OCaml. It’s a great library too :slight_smile:

As a result, I pushed a new branch “http2” where I used the ocaml-h2 library instead of httpaf. I implemented just enough gRPC on top of it to make things work, but it’s definitely not up-to-spec, and I had to use Obj.magic to access some private code from h2.
In this version, Envoy is optional as we don’t need translating to and from HTTP/1.1 anymore!

@anmonteiro, could you look at my commit to see what private functions I had to use and maybe consider how we can have the required functionalities public?

In any case, it shows that we’re very close to the simplest possible gRPC implementation in pure OCaml. We just need to abstract that into a library and finish implementing the gRPC protocol, which shouldn’t be too hard.

Regarding using Envoy, it’d be interesting to measure the performance impact, and you clearly don’t need it for development and low-volume services. However, I would still highly recommend it for any serious production use case, because it gives you a way to scale and you will want what I mentioned in my original post: load balancing, connection pooling, service discovery, security, retries, timeouts, network stats, dynamic routing updates, hot restarts, etc.

5 Likes

am I dreaming?
a future where ocaml grpc clients and servers are generated from protobuf files?
is this actually happening?
and http2?

1 Like

This is very interesting. It’s really cool to see some bare-bones examples like this. Another way to go about gRPC + OCaml is interop with grpc core. Rust’s grpc-rs does that.

2 Likes

Oh, excellent! I know there are reasons for reusing existing GRPC assets, but still … the opacity of GRPC, the central direction by one company, the inability of outsiders to influence anything about it (no matter how valid their criticisms) are all good reasons to applaud a from-the-ground-up implementation.

Bravo!

1 Like