Simple, modern HTTP client library?

lindig · January 24, 2023, 1:28pm

I am looking for a library to make HTTP request in an existing application, either synchronously and using the LWT framework. Under the impression that httpaf is up and coming and cohttp on the way out, I looked at:

GitHub - roburio/http-lwt-client: A HTTP client using HTTP/AF and lwt
GitHub - inhabitedtype/httpaf: A high performance, memory efficient, and scalable web server written in OCaml
GitHub - aantron/hyper: OCaml Web client, composable with Dream [unannounced]

And found::

http-lwt-client - has the desired functionality and abstracts nicely from httpaf but its (transitive) dependencies are huge. I would prefer to rely more on existing OS-level services for DNS, TLS than using OCaml all the way down like a Mirage unikernel would require.
Developing against httpaf directly seems complicated (telling from the code in http-lwt-client); because everything is chunked, reading and writing is complicated and looks quite imperative.
Hyper looks good but at this point not mature; I don’t know about its dependencies.

I would be interested about the trade-offs in this space and other viable options. I would have thought that making HTTP client requests is so common that this would be easier.

yawaramin · January 24, 2023, 1:40pm

I have been a happy user of Piaf for a while now, it provides a nice API. If there’s one thing missing I’d say it’s a built-in cookie jar, but you can probably rig one up or find a separate library for it.

anuragsoni · January 24, 2023, 4:53pm

If using libcurl is acceptable, the OCaml bindings for it are a good option as well. I’ve found GitHub - c-cube/ezcurl: A simple wrapper around OCurl. to be a pretty nice high-level lwt wrapper for the curl bindings.

smondet · January 24, 2023, 7:03pm

OOC what makes you say Cohttp is on the way out?
(I’ve been trying the new Eio backend the past couple of weeks, lots of development: Commits · mirage/ocaml-cohttp · GitHub )

lindig · January 25, 2023, 1:25pm

I gained that impression from performance discussions and this overview: Low-Level HTTP Protocol at ocamlverse.net. Maybe this is not accurate.

h2: High performance http2 implementation.

httpaf: A high performance HTTP implementation written in OCaml. Compatible with Async and Lwt.

cohttp: Older, slower implementation of HTTP supporting only HTTP/1.x.

yawaramin · January 25, 2023, 4:52pm

Interestingly, basically the only way to get a web client with a ready-made cookie jar in OCaml is to use the ocurl ‘low-level’ curl binding. I’m left wondering if people don’t use cookies in their web requests or if they roll their own.

Frederic_Loyer · January 29, 2023, 8:15pm

I have just tried cohttp (choosen before the opening of this thread), but with a new URL, this breaks:

Fatal error: exception Tls_lwt.Tls_alert(6)

The program is simple:

let url = "https://forum.vbulletin.com/external?type=rss2&nodeid=28"

let http_get url =
  let%lwt (resp,body) = Cohttp_lwt_unix.Client.get
                          (Uri.of_string url) in
  let code = resp
             |> Cohttp.Response.status
             |> Cohttp.Code.code_of_status in
  if code / 100 = 2 then
    begin
      let%lwt b = Cohttp_lwt.Body.to_string body in
      Lwt.return (Some b)
    end
  else
    Lwt.return None;;
let () = Lwt_main.run begin
match%lwt http_get url with
| None -> Lwt.return @@ print_string "HTTP error"
| Some x -> Lwt.return @@ print_string x
end

yawaramin · January 29, 2023, 10:57pm

Looks like there is an open issue with this error Provide an example of a tls client · Issue #479 · mirage/ocaml-cohttp · GitHub

hannes · January 30, 2023, 8:39pm

Hi @lindig ,

The aim for http-lwt-client is:

allow IPv4 and IPv6 (with the IETF happy eyeballs approach RFC 8305 - Happy Eyeballs Version 2: Better Connectivity Using Concurrency)
allow HTTP 1 and HTTP 2
does certificate validation
provide an easy API and easy to read code

Due to other projects in my pocket, I used OCaml-DNS and OCaml-TLS, but I’m sure it’s pretty straightforward to (a) implement happy-eyeballs using Unix.getaddrinfo (or Lwt_unix); (b) use an OpenSSL binding instead of OCaml-TLS.

But to be honest, what would we gain from it? The “dependency cone” inside opam shrinks, but outside opam grows (libc/libresolv and libssl). As usual, it is a tradeoff (and of course I understand if you “just want a HTTP client” – but working with strings being passed to curl or libcurl, I’m not very convinced, in OCaml we have types :D). Pretty similar to http-lwt-client, for the MirageOS ecosystem, there’s http-mirage-client

hannes · January 30, 2023, 8:41pm

Frederic_Loyer:

I have just tried cohttp (choosen before the opening of this thread), but with a new URL, this breaks:

Fatal error: exception Tls_lwt.Tls_alert(6)

The program is simple:
let url = "https://forum.vbulletin.com/external?type=rss2&nodeid=28"

Interesting, I just tried hurl.exe “vBulletin Community Forum - vBulletin Announcements” (with hurl from http-lwt-client) and this resulted in HTTP status 200 and some data.

I don’t quite understand the Tls_lwt you get, but it is likely somewhere deep inside cohttp / conduit. I recommend to give http-lwt-client a try

edwin · February 1, 2023, 12:13am

It would be good to leave the choice of the TLS library to the end application, to ensure there is only one linked.
E.g. if an application acts as both TLS server and client (e.g. a webserver that makes API calls to other web services) then you very likely want either:

OpenSSL on both sides
or Ocaml-TLS on both sides
but not OpenSSL on one side and Ocaml-TLS on the other. This combination would just increase the attack surface, since you would now potentially be vulnerable to the union of Ocaml-TLS and OpenSSL security issues, instead of just one side

Large dependency trees aren’t necessarily a problem on their own if all the dependencies (and their reverse dependencies already used by a project) are well maintained, and quick to update to support new compiler versions, or new versions of dependent libraries.
However when you already have a project with a fairly large dependency tree (e.g. XAPI with ~200 packages), then adding ~50 more increases the chance that there will be version conflicts, or one package will hold back updating lots of other packages. And it might mean that you’d need to upgrade some of your dependencies before you’re even able to install or try out a new library, except that upgrade might have breaking changes which might need further library updates or code updates to support new versions, etc.
If the dependencies are essential (e.g. the http implementation itself, etc.) then I don’t mind so much having additional dependencies, however in this case the dependencies would be almost completely unused (happy eyeballs and domain lookup), while still adding a maintenance overhead on the overall project.
If the target IP address is already known or has been solved by other means (e.g. configuration file), then happy-eyeballs isn’t really required and would be great if it’d be optional.
There are of course tradeoffs here, because the alternative to lots of small dependencies are large monolithic frameworks, which have their downsides too.

Hope this provides some background on why smaller or optional dependencies are preferable by some users (and all of the above doesn’t mean there is anything wrong with the quality or usefulness of ‘happy-eyeballs’ or ‘domain-name’, when they are actually needed by the application)

yawaramin · February 1, 2023, 12:19am

Didn’t the Mirage team kinda pioneer the concept of functorized swappable dependencies? Seems like the TLS implementation fits that bill.

dinosaure · February 1, 2023, 12:35pm

It seems that we have 2 questions:

a simple library to do HTTP requests
the ability to choose dependencies to implement TLS layer (for instance)

It really hard to solve these questions at the same time. The first should provide the easiest API to be able to send a request. The second should abstract any details involved into the way to do an HTTP request. On the second problem, it requires that the end-user understand and want to change these details. Therefor, this is where such view will mecanicely complexfiy the API (at the expense of the first point). In this respect, a simple API necessarily makes arbitrary choices regarding dependencies and implementations in order to “hide” these technical details.

Now, and from the MirageOS perspective, as @yawaramin said, we want to abstract everything (up to the TCP/IP stack). This means above all that there are levels on which users can play to diverge from the choices we have made (like the choice of ocaml-tls), for example with regard to http-lwt-client.

These levels are not necessarily highlighted but they exist and there is already one: http/af. The design of the latter has the advantage of leaving the implementation of the I/O to the user (and that’s why its integration into MirageOS was not so difficult).

Leaving the choice of the I/O means leaving the choice of the TCP/IP stack but also, basically, of the communication medium used by HTTP. And this way can be replaced by a composition of TCP/IP with TLS for example. http/af gives another possibility that is more subtle and makes it a bit more difficult to integrate - the scheduler. Indeed, the management of the I/O is very intrinsic to the scheduler. Thus, it is not only a question of offering an implementation of a protocol (TCP/IP or TCP/IP + TLS) but also of implementing a way of scheduling the read/write operations when it is about the HTTP protocol.

This is where a first choice, from Mirage’s point of view, is made: to implement the scheduling of http/af. As far as Mirage is concerned, the choice was to use Lwt. Thus, paf offers a scheduling implementation independent of the protocol (as long as the latter respects our Mirage_flow.S interface).

Then, from this abstraction, another choice can be made, the protocol. Again, from our experience with CoHTTP and Conduit, we decided to use mimic as a protocol implementation. The latter has the advantage, like the virtual methods in OCaml, to let the user inject his/her own protocol afterwards.

This method is more the order of the choices of abstractions we can/want to make than a real choice of protocol implementations. Indeed, we could choose an abstraction by the functor (again, mentioned by @yawaramin) but our experience showed us the limits of such a method.

Finally, the real choice that interests us regarding MirageOS is the injection of a particular TCP/IP stack. In this, http-mirage-client has made the choice to use ocaml-tls while waiting for the choice of a TCP/IP stack (which will be given afterwards by the mirage tool). It should be noted that there is finally an equivalence between http-lwt-client and http-mirage-client and we are thinking about factoring the first with the second as a specialization with the TCP/IP stack of the host system.

All this to say that there are several levers that users can play on. Mirage’s approach has always been to offer these levers at all levels and, of course, this requires, at each level, a thorough knowledge of what is going on. http-lwt-client is just the result of all the choices one can make about HTTP. The real question now is not so much the API or the choice of dependencies, but at what level you are .

PS: I didn’t mention h2 on purpose to stay on a basic explanation but of course, since h2 shares the same design as http/af, the clear advantage of all these levels is to offer requests with http/1.1 and h2!

edwin · February 2, 2023, 12:04am

Thanks for the detailed explanation, let me try to summarize the various choices in this table, please let me know if I missed any or got it wrong:

name	OCaml dependencies	System dependencies
ezcurl	5	1
hyper	44	2
piaf	50	2
cohttp-lwt	74	3
paf	81	3
http-lwt-client	92	3

_{dependencies were counted by installing into a fresh switch on 5.0 where possible, if not then 4.14.1, running opam list and subtracting 11 , the number of base packages in a vanilla switch, and also opam list --depexts on Fedora 37. The checkmarks are based on project descriptions, dependencies and this thread, I haven’t actually tested whether each feature works. I didn’t list http/af and h2 in the table because they seem to be low-level libraries that are not meant to be used directly.

http-lwt-client is considered to “support” mirage through “mirage-lwt-client”.

‘ezcurl’ is considered to support OpenSSL on some OSes (it may also use GnuTLS or NSS).

I didn’t list HTTP/3, because only ‘curl’ would support that, but even there the support is experimental}

None of the libraries “tick all the boxes”, so there is a paradox of choice here (still the original ‘cohttp’ comes closest to ticking them all, if it wasn’t for the lack of HTTP/2 support)

I don’t think it is unreasonable to expect a small dependency cone for http-lwt-client when its README claims it as a feature, however according to the above table its dependency cone is the largest.
If a small dependency cone is not a goal of the project, that is fine, however advertising it in the README sets an expectation for users (to be fair though, its direct dependency list is smallish).

To answer the question about level: we are currently using a combination of ‘cohttp’ applied with a custom ‘Unix’ functor, and ‘cohttp-lwt-unix’ + ‘lwt_ssl’. However this is limited to HTTP/1.x and we thought to investigate and try something new, with the eventual goal of perhaps using gRPC.
It is not very clear what conduit’s “successor” is (or whether it needs one):

tuyau (which apparently got merged into conduit?)
gluten
mimic

IIUC then mimic is the preferred choice nowadays?

Thanks for the mention of piaf btw, I was aware of paf but haven’t noticed the existence of piaf due to its similarity in name (and that it is a completely different project)

There are of course lots of other criterias for choosing an http library, most importantly “does it work”, do the examples from its documentation work, and maturity level of the library and we haven’t tried them all yet. And although ezcurl seems like a nice choice in terms of dependencies, and API simplicity, it failed to send an HTTP POST with JSON contents (PR opened, seemed easy enough to fix!).

I’m also aware of the ocaml-grpc library, however it might be a bit too early to try it out, it doesn’t seem to have any TLS support at all yet (although that functionality could perhaps be built on top of it).

yawaramin · February 2, 2023, 1:59am

Couple of small points here.

Piaf supports OCaml 5 using Eio: Add multicore support by fraidev · Pull Request #151 · anmonteiro/piaf · GitHub

Ezcurl is a wrapper for ocurl, which is a curl binding. So at least in theory it supports anything curl does.

Having said that, if your goal is gRPC, for production use I’d probably recommend setting up a forward proxy that upgrades requests to TLS-encrypted. Check this post for insights: HAProxy as Egress Controller - HAProxy Technologies

With this setup you wouldn’t need any OCaml code changes for TLS support for your gRPC client.

edwin · February 2, 2023, 9:49am

Thanks, I’ll update the table once the version containing that change is available on ‘opam’, at the moment the ‘piaf’ in opam has a constraint ‘< 5.0’, and the master version of ‘piaf’ that would have OCaml 5.0 support doesn’t build:piaf.opam: eio and multipart_form are dependencies by edwintorok · Pull Request #163 · anmonteiro/piaf · GitHub

dinosaure · February 2, 2023, 10:00am

Thank you for this synthesis . I would first like to reaffirm what @hannes said. The dependency metric can be biased. OpenSSL or Curl can be seen as “big” dependencies (which add a lot of C code) where http-lwt-client is only done in OCaml (with some C for performance).

It’s hard to have a fair comparison between all these projects. Again, the API they offer is not the same and does not have the same purpose. Then, one can reasonably say (at least, this is my point of view) that it is more interesting to have only dependencies with OCaml than to use C libraries (with all that it implies as we know as OCaml users ).

Finally, as far as paf is concerned, once again it only takes care of the scheduler. Only paf.mirage uses ocaml-tls and it is possible to use OpenSSL with paf if you wish.

Concerning the history between tuyau, conduit and mimic, to make it very quick tuyau is an experimental project. We wanted to integrate it into conduit, but an internal choice made us abandon that integration (in other words, Conduit today is basically the same as it was 5 years ago). mimic is the usable (and used) version of tuyau. Finally, gluten is directly inspired by Conduit.

I wouldn’t want to say that mimic is the best choice. In truth, mimic addresses a very specific issue related to MirageOS. Again, experience has shown us a scalability problem with Conduit for slightly more complex unikernels. If the objective of the unikernel is not in your roadmap, I advise against using mimic. Otherwise, in the context of a unikernel, it is indeed more advisable to use the latter which integrates better with unikernels like opam-mirror.

I’m of the school of thought that the ultimate solution doesn’t exist. http-lwt-client was made because we had a lot of (still existing) problems with TLS and Conduit and it was difficult for us to continue to evolve this project - at least, I made it clear several times that I wouldn’t maintain Conduit anymore. It is perfectly understandable that http-lwt-client is not the solution for everyone who wants to make HTTP requests. The main reason I’m here is to clarify our goals and how we thought about these libraries - at least, to clarify the evolution and direction of these projects. And I much prefer this fragmentation than a centralization of such a problematic that would concentrate all the misfortunes of the world . In short, everyone is responsible and I just hope that all this information will help future readers in their choices.

PS: I would like to insist even more that the choice is not - and should not be - only technical. A lot of social relationships build the choices we make. We have to keep this in mind.

emillon · February 2, 2023, 10:22am

For completeness I’d add curly which shells out to the curl binary, with no C stubs. Not for all use cases but we’re happily using that in dune-release for example.

edwin · February 2, 2023, 2:30pm

I very much prefer OCaml code to C code where possible for security and stability reasons, but stopping at the ‘Unix’ boundary: there is normally no need to reimplement a DNS resolver for example because one already exists and is configured by the admin of the system, introducing one in your own application then puts all the configuration and maintenance and debugging cost on each application.
E.g. using the system resolver allows others who have no knowledge of OCaml can debug and fix it.
And OCaml developers may not necessarily all be knowledgable about DNS internals and how it works to debug it when it goes wrong, so essentially the maintenance is restricted to the intersection of ‘those with OCaml skills’ and ‘those with DNS/syadmin skills’, which is a much smaller pool than either of them.
Of course other environments will want to make different choices (e.g. unikernels).

I’d love to be able to use something other than OpenSSL for TLS, but that would first require to convince our security team that is a viable replacement and is not vulnerable to at least the known OpenSSL vulnerabilities. And every time there is a new OpenSSL vulnerability I’d very likely be asked whether my application (which uses ocaml-tls) is vulnerable or not, which would either require me to become a TLS security expert, or do some digging/testing everytime (potentially in collaboration with upstream ocaml-tls). And someone will eventually want a FIPS certified version of it.
That is a much larger investment than just picking an HTTP library, so I hope you understand why flexibility here is desirable, so one can incrementally introduce make such changes.

Thanks for the clarification

True, I usually approach using a new OCaml library with the expectation that I’ll have to do some minor fixes to make it suitable for my use-case (or to fix bugs revealed by my use case), as long as basic functionality is there (e.g. establishing and verifying the connection).
Even though ideally an http library is probably something that one may expect to just work with all common use-cases.
However deployment and long-term maintenance of an application can also be a concern, and early choices in the libraries used can have a significant impact.

I wouldn’t want to reimplement certificate and hostname checking myself though, which is why an existing solution is probably preferable. I don’t think OpenSSL comes with a built-in solution here, e.g. curl used to do its own on top.

anuragsoni · February 2, 2023, 2:44pm

Not to derail the conversation, but OpenSSL has shipped an option to validate certificate hostnames since version 1.1.1. You can use SSL_set1_host to ask openssl to validate the hostname on the peer. certificate.

Topic		Replies	Views
Choosing a HTTP Client Library Learning	7	2410	June 18, 2021
[ANN] cohttp 6.0.0~alpha0 released Community announce	0	779	November 15, 2022
[ANN] HTTP client library Community opam , announce , http2 , http	5	1064	September 7, 2021
Tutorial for Cohttp-lwt as API client Learning	13	4688	September 22, 2020
[ANN] Release of cohttp 4.0.0 Community	2	1410	March 31, 2021

Simple, modern HTTP client library?

Related topics