I’ve been playing with my website lately, more precisely on how the contents is delivered to the readers. Before, it was merely a boring, static website delivered by Nginx; now it’s a Dream-powered HTTP server with all the pages in-memory.
I’ve written about this fun, little project, and you may find the article interesting. It covers several topis: fun experiments with the Dream library, HTTP arcane one cannot ignore if they want to implement a browser-friendly server, and even some Docker because why not!
Next time you refactor this you may want to look into bytesrw, I’m pretty sure it can avoid you a few pipes.
Also your caching strategy for ressources can be improved! These five minutes are inelegant :–) The best way to do it for page ressources is to version the urls by adding a query with a stamp (query which your server can gleefully ignore to serve the contents) e.g.:
As long as your stamp (query in fact) doesn’t change your clients won’t try to download your ressource again (for the maximal possible max-age).
For the stamp you have a few options, you can add it manually in your template for certain ressources (e.g. you can use the release number for fonts), you can use a hash of the ressource content, or a more global hash/version number specific to your set of ressources (a bump will then invalidate all ressources). Anything will do, just remember that if you want clients to redownload that ressource, the stamp (query in fact) will have to change.
This is a great article! It’s impressive how much you can with a simple setup.
That fact that it’s not worse that a usage of some industry standard solution is quite a great sign that you don’t need to overengineer much, you can just use plain solutions.
I’m especially interested in the static binaries approach, and I bookmarked the page
Very interesting article! I’d like to propose a few ideas and also fix a few problems if you don’t mind.
ocaml-crunch is first and foremost a MirageOS project (and not Tarides) maintained by several people (who work at Tarides but also at Robur as well as other people who are part of the Mirage association - see this exhaustive list).
If you want to be a little more exhaustive when it comes to Content-Type, I can recommend Conan, which is a re-implementation of libmagic/file in OCaml. Like all Mirage projects, it can be run without a file system (and therefore in RAM).
To take things a step further, you could try out our unipi unikernel, which has been running for a number of years now (for our website or my blog). Here too, we can easily imagine storing everything in memory, but we’ll soon reach the limit (now increased to 4 GB) of a unikernel’s RAM, which is 1 GB.
Unikernels require a more complex deployment method than a simple Docker, but imagining an entire system written in OCaml serving your blog/site is pretty sastisfactory. Recent changes in albatross (which can be installed via apt - https://apt.robur.coop debian-12 main - on a Debian) make deployment much easier.
Thanks for your interest and feedback, everyone! It’s pretty great to read
@dbuenzli, I’ll definitely keep in mind your pointers and suggestions, I wonder if what you are suggesting wrt. stamps is easy to implement with Soupault or not. Might be a little involved, not out of reach.
@chshersh well, I would say it is not worse in the particular setup of my website, which does not have to handle that many requests a day I wouldn’t bet on it if there were a surge of interest and a lot of readers all at once. I plan to do some benchmarks with Locust soon™.
As for static binaries, happy to help. I find that this is something OCaml does pretty well, but is not that well documented at the same time.
@dinosaure I’ve deployed a fixed version correctly mentioning MirageOS instead of Tarides, thanks for letting me know!
Unikernels are pretty interesting, but my first idea is rather to use this opportunity to learn Kubernetes since I can now easy deploy my website as part of a Docker image, it looks like I could setup a simulacrum of cluster and see how it works.
I’m not familiar with Soupault, I just had a quick look at it.
If you store your content in a git repo you could perhaps simply have a cli tool that takes an input on stdin and simply appends ?stamp=$(git describe --always) at the end of your href and then use the preprocess element widget and apply it on the hrefs of your page ressources in your template. But I’m sure @dmbaturin can suggest a more idiomatic workflow.
P.S. I’d just like to mention that looking at your gzip function again, it shows a bit too much of what should not be done to my taste: the function can leak fds in case of errors and domains are not meant to be used that way (it’s rather spawn one long running domain per CPU you have). It’s not necessarily more complicated to correct it to use Fun.protect invocations to make sure all your fds get closed even if the function blows up and use Thread.create so that the netizens cut and paste correct code.
However in the spirit of making you get rid of this code, except for Bytesrw_zlib which I already mentioned (see also this cookbook item) you could have used simply have used this function in zipc (though that would only support the ill-named HTTP deflate, which expects not a raw default but a zlib stream) or decompress, if you manage to sift through the overengineering.
Re: Kubernetes, check out https://k3s.io/ which uses SQLite instead of etcd for its control plane data store, or https://microk8s.io/ from Ubuntu. They are both designed to be tiny minimal clusters. Admittedly though I haven’t used them myself (I just use Kubernetes at work, not for personal stuff so far but in the future who knows).
Regarding the stamping suggestion here, if I recall correctly your post mentions that ocaml-crunch exposes a val hash : string -> string option function, so you likely have a ready-made hash available already (although you may have to convert it to hex and trim it to a reasonable size etc.).
I’m not saying you are wrong, but IMO, the gzip function is actually not that important. It’s called only before the HTTP server is started, one call after the other. If there is an error in Domain.spawn, the program just hangs (and if it’s outside of Domain.spawn, then it exists not that gracefully).
It would definitely be different if the gzip function was called by the handler, though.
Edit: it is probably a good idea to provide more context in the article to explain the shortcomings of this gzip function, though. And why it’s not a good idea to use it as-is in a different context (which is your initial concern, if I understand correctly the nice word “netizen”).
Yes. Personally I read your article quite quickly and I didn’t realize this was done only once – even though it is written.
In general I think it’s better to be exemplar when code is published or mention that it’s not in a comment along side the code (so that it gets a chance to get cut with the code :–). Published code always ends up being cut and pasted around by someone for play or work.
That’s fair! I’ve added a red block just after the function, explicitly quoting your comment. And I’ll see to look at the various links you shared, potentially to fix the issue altogether.
Nice blog post! Regarding compression, have you considered the dream-encoding library by @tmattio? I also abandoned nginx recently and was missing compression, so I liked this library because it only required adding a one-liner Dream_encoding.compress @@ as a dream middleware (and it handles gzip/deflate with decompress, another mirage project, which should be fully compatible with your ram-only solution out of the box)
Pretty cool! I need to dig into dream-html at some point, it looks like a very cool project.
I didn’t want to rely on a middleware because I wanted to be sure the compression was only done once at startup. That being said, looking at the API of the lib, I could have used the exposed helpers to determine whether encoding was requested or not.
I had a look at decompress. It’s for pretty good reasons (you need that for the level of control you are targeting), but the API does not seem very straightforward
About decompress, we can indeed tweak multiple “details” like the window or the internal queue used to compress a document. However, I started an API which is probably much more easier that the basic one: see here for the DEFLATE format. The documentation provides a little example of how to use it.
The main objective of decompress is to control the memory footprint of the compression/decompression for long-lived programs (like servers) and be sure that we don’t allocate at all even if we need to inflate/deflate multiple files.
This specificity of decompress is actually used by ocaml-git where you can prepare few buffers and be sure to extract multiple Git objects without a possible Out_of_memory exception. Again, in the context of an unikernel, we don’t have much RAM, so we must prepare things to be ready to serve a service without allocating a lot.
Of course, we’re open to improving the API as needed. A cookbook “à la bytesrw” is also possible, and I’ve got a few decompress tutorials in the works.