Simple, modern HTTP client library?

From the README of http-lwt-client:

Its dependency cone is small, and does not include any wrappers or frameworks.

Above:

Due to other projects in my pocket, I used OCaml-DNS and OCaml-TLS, but I’m sure it’s pretty straightforward to (a) implement happy-eyeballs using Unix.getaddrinfo (or Lwt_unix); (b) use an OpenSSL binding instead of OCaml-TLS. But to be honest, what would we gain from it?

In the table below http-lwt-client has the largest dependency tree - I was obviously surprised by this. An http client library for me is primarily a tool to solve another problem and not a goal in itself and I’m not keen of forking it to make the changes and now have another problem.

I disagree that trading OCaml dependencies against system (OS-level and distro-level) dependencies is not an advantage: the OS and libraries like curl or OpenSSL have a bigger community behind them and are less likely to create incompatibilities that prevent updates compared to a smaller OCaml ecosystem that is moving towards OCaml 5 and effect handlers. How many of the 92 dependencies could miss a step here? I chose ezcurl for now.

I would like to challenge you on that. There are three obstacles to replacing some complex C/C++ library with OCaml code:

  1. often these are complex libraries, and even to reimplement them requires high-skill individuals. It isn’t as simple as transliteration. Those individuals are in short supply, and for example in the case of the re library, Jerome Vouillon is a scarce resource. And yet, re is an incomplete implementation of at least Perl regexps. re is supposed to implement multiple different families of regexps: this is an enormously hard target, and a massive expenditure of time will be needed to finish this task.

Where, instead, if one could find C libraries to wrapper, the time expended could be much, much less.

  1. These libraries sometimes involve specialized knowledge and experience. For example, the leveldb/rocksdb libraries embody enormous experience with fault-tolerance and high-performance for database/key-value store subsystems, and the idea that you’re going to find an extremely skilled OCaml hacker who is also a deeply experienced database engineer is … well, that’s not going to happen.

So you’ll get substandard implementations on one or both of those axes.

  1. And last, I actually question whether you’re going to get more security by going with a pure OCaml implementation of (e.g.) some security code. TLS is tricky, and you have to get it just right. Perhaps what you mean by “security” is “pointer-safety” and “type-safety”. But for many complex libraries, that is merely the beginning of what counts as security and safety.

What you’re sacrificing is the eyes of many, many people on the C/C++ library, and hence the bug-finding and -fixing that those people will perform. This, BTW, is a reason to not use some fancy hot-off-the-presses TLS implementation that some guy wrote in C++, even if it looks waaay spiffy. B/c without a large number of users and hence some expert users who will investigate that code, you’re signing up for taking all that risk on yourself. And this translates mutatis mutandis to OCaml.

3 Likes

Although just because a library is used by many many people, it doesn’t necessarily mean that (m)any are looking at the code. Which was the case with OpenSSL and Heartbleed, where I think everyone assumed that someone else surely would’ve been looking at and auditing and improving OpenSSL, but the reality was entirely different. Situation has improved since that bug and OpenSSL is getting a lot more attention these days, so I’m much more happy with using OpenSSL as a dependency these days.

However the used by many doesn’t necessarily mean that the library is free of even the most trivial bugs. I tried using SDL2 recently and got 2 segfaults almost immediately (one upon creating the window, another by just moving the mouse), both of them bugs in SDL2 itself (their wayland implementation was somewhat broken). To be fair they were fixed pretty quickly (this is where the used-by-many is an advantage, it also has a large pool of contributors).

True, which is why I’m often undecided between using an OCaml library with a C binding (and increase the risk that I’m going to have to debug segfaults and memory corruption bugs, either from the C library or the binding), or an OCaml library where I don’t have to debug those bugs, but may have different ones due to the smaller pool of contributors, maturity of the library, etc.

Yes, see above why switching OpenSSL to Ocaml-tls in an application is not a small task, and one that I’m not quite ready to take on yet.

However the HTTP library doesn’t require any tricky cryptography: it is a well specified text (or now binary) protocol, and that is suitable for an OCaml implementation.

Yes, this is a tradeoff one often has to make, although binding C libraries isn’t trivial either, and some bindings have latent race conditions / bugs and then you spend weeks trying to debug why your OCaml program is segfaulting. (ctypes sometimes helps avoid them, but in other cases it is the source of bugs). More on this later, I’m working on a tool that can detect some of the bugs I ran into lately in this area…

1 Like

Just to clarify http-lwt-client comes with a dns client that will attempt to read /etc/resolv.conf. Curl comes as well with its own dns client with a cache, happy eyeballs implementation and everything. I also think this short blog post is worth reading. Some things about getaddrinfo that surprised me

1 Like

Thanks for your insight. From my learnings, “choose the TLS implementation” has been attempted by conduit - a hard to understand, barely documented, barely maintained library that has various other problems.

And “f the target IP address is already known or has been solved by other means (e.g. configuration file)” – to me it is a question of scope, and to me a HTTP client attempts to access an URL, so there’s usually name resolution included.

But, as said, I hear your demands, and hope you find a working HTTP client for your scenario.

1 Like

Thanks for your table - but I’m wondering what your selection criteria for a HTTP client is… for me, some other things are important: IPv6 support, certificate verification, licensing (including OCaml deps / system packages).

That’s great to hear, and it is very fine for people to have different demands.

Just for the record, I had similar reservations. Over time, I think we did pretty well with OCaml-TLS (security review, fuzz testing, interoperability testing, security bait) – you may enjoy to read the 2015 paper https://usenix15.nqsb.io/ – of course suggestions are always welcome, and I agree that security and cryptography needs some clear thinking.

1 Like

First it needs to work, and we need to be able to integrate it into our build system.
Next it needs to be able to talk to the HTTPS services we are already using or about to use (obviously HTTP/1.x minimum, but in addition to that we are exploring talking to etcd, which does have a backward compat JSON based proxy, but its native interface is gRPC, which means HTTP/2).
Licensing: anything LGPLv2.1 compatible should be fine, since that is the license of the project I’m already working on.
We also currently have a mix of Unix/Threads and Lwt code, so ideally something that can be functorized over both would be good (cohttp fits the bill here), but not necessarily a hard dependency (worst case we could put it into the Lwt part of the code and do an rpc call from the Unix part of the code, but that is not ideal).
Certificate verification: yes that is a requirement (though only with a local CA), we are currently using ‘stunnel’ in client mode for this, but it is not ideal: when connections are dropped or anything goes wrong you now have 2 logfiles to correlate, so error handling is not great: cohttp only sees that a connection drop or wasn’t established but not why. In another place we used conduit for this, but we had to patch it to support certificate verification the way we wanted it to (we connect to an IP address, not a DNS name, which obviously renders some hostname checking impossible, but we do want certificate chain checking, and in practice all hosts that talk to each-other have the same locally generated CA).
TLS stack: we are currently using OpenSSL and deciding whether or not to move away from that is out of scope at the moment.
IPv6 support: would be nice, although we currently don’t yet have IPv6 support in all part of the code that we’d like to, but if we do it’ll be again a static list of IPv4 or IPv6 addresses to talk to (the hosts are part of a pool and they only talk to each-other, and usually there is no DNS set up, and hostnames are assigned statically or by DHCP).

2 Likes

There are probably 2 use cases for http clients: using an http client to interact with a traditional web server and forms, and using an http client to interact with a web service (some form of REST-like API, perhaps even running on localhost).

The former needs all the features: DNS resolution, IPv4/IPv6 support, full TLS chain/hostname verification, streaming support, etc. TBH even an HTTP/1.x client is fine for this purpose. I think most of the libraries discussed so far cater for this use case primarily, which is fine.

Whereas for the latter you probably want to establish one persistent connection (if possible) and perform short requests/replies on that. Service discovery may have already been performed (using DNS or other mechanisms). If possible this one would need HTTP/2 to avoid head-of-line blocking, so it needs to implement a more complex protocol, but the functionality it needs to provide is actually simpler than in the former case (it really only needs to take a URL, and string or JSON as input, and return a string or JSON as a result). In fact even a non-lwt implementation would suffice, since it wouldn’t make sense to launch tens of thousands of client requests all at once, if that needs to happen then some prioritization/worker pooling can be done prior to the HTTP layer (you don’t want to flood the other web service…). This is the use case I’m most interested in though.

2 Likes

What you’re describing seems to be a distributed system using RPC calls. It’s more complex than it seems at first glance. You would need a connection pool with error handling, retries with backoff, liveness checks and marking connections as dead, etc. This is before we even get to the session cookie management (you will probably need to authenticate for some of the RPC calls).