I think there are two important next tasks: getting full coverage of all of the infrastructure machines and making the data publicly available. I hope to work on this (in an open-source way) in the future, if anyone else is interested do let me know!
I’d be delighted to see more engagement on two separate fronts, from Patrick’s superb start:
work together on reducing emissions from OCaml-based projects, based on this data. For example, how to reuse build results and artefacts and minimise the amount of we do across the infrastructure. This has implications beyond just ocaml.org, since there are thousands of CI jobs running for OCaml projects across GitHub and Gitlab regularly that often repeat work that might otherwise be shared.
continue to extend Clarke – and the other libraries used for power monitoring – to increase their portability and integration into the ecosystem. It should be easier than ever to include power monitoring by default into our OCaml-based infrastructure, and to ensure that the results of this work can be used to create actionable change to drive our emissions down.
…and of course, other suggestions for future directions are welcome!
That’s a great project. I always wonder what is the associated cost with testing packages when releasing Dune, this already help giving a ballpark figure.
I just contributed a provider for France, it was pretty easy so I encourage others to do the same!
A really quick win for someone that knows about static linking on linux, would be to work out how to statically link clarke on Linux against variorum. That would open up monitoring our POWER & ARM64 machines, and also give more accurate statistics.
I wonder, you have some numbers, but would it be feasible to collect the number (of kWh) required for “opam.ocaml.org - updating the opam-repository”, or “opam.ocaml.org - serving the community a day of packages”, instead of “our server farm consumes X kWh”? Same of course for “running ocaml.org” and/or “the OCaml-CI” or “the opam-repository CI”, or “one opam-health check”.
It looks like at the moment, it is rather tough to figure out what effect a certain change is having (since it’s a big compound of services all taken together).
I also wonder what you account and what you do not account – i.e. the use of external services (such as docker hub, gandhi for name services, GitHub webhooks, notifications (matrix? slack? whatever is used).
I think the easiest way to get to some of those numbers is by monitoring more things, collecting the data and then processing it. For example, we already have per-machine power (and therefore emissions) data. If we had all machines covered, we could produce some approximate numbers for the power/emissions of services since we know what runs where (we might need to make that data a little more detailed). For machines running multiple services we might need some extra info like CPU utilisation of processes to use as proxy for “how much power am I using as percentage of total power of this machine”.
To correlate specific activities with power I think we’d need even more information for example the span of when a particular opam-health check was started and when it ended or when a new version of dune was released to opam-repository. For those services using OCurrent that data is actually stored just not particularly easily available and gets garbage collected every now and then.
For now, in my opinion, it’s anything that runs on a machine listed at: General - OCaml Infrastructure. Of course, our emissions go beyond that, but given time and resources that’s where I think we should start. Beyond that I would work through external services that we think are big hitters. I think the usage of docker hub might be a good candidate with lots of images hitting there frequently (lots of stored data means potentially lots of hardware churn which has a serious amount of embodied carbon).
The library used for accessing emission information from electricity grids called “carbon” should now be available on opam. This is an initial release and might change a bit (e.g. it depends on an alpha release of cohttp-eio).
opam update && opam install carbon
Here’s a simple example to get the emission numbers for Great Britain.
Eio_main.run @@ fun env ->
Mirage_crypto_rng_eio.run (module Mirage_crypto_rng.Fortuna) env @@ fun _ ->
Carbon.Gb.get_intensity env#net
|> Eio.traceln "%a" Carbon.Gb.Intensity.pp
Thanks to @emillon for providing a backend for French data and to @reynir for suggesting important changes to the CO2-signal backend before the release.