While browsing the new version of ocaml.org, I came across this page: Carbon Footprint.
I would like to know if anyone knows anything about it?
In particular, to which actions/projects does this sentence: “Over the years, the OCaml community has become more and more proactive when it comes to reducing its environmental impact” refer?
As part of this journey we have documented our efforts towards becoming Carbon Zero.
That’s interesting, I would love to read about this too
In the meantime, check this “Energy Efficiency across Programming Languages (2017)” paper out, by Greenlabs.
Thanks for the questions, just a quick disclaimer that (a) I can only talk about the work that I know about i.e. from an OCaml Labs perspective and (b) the new site has lost its banner saying that it is very much a WIP, especially the copy used on the pages so that was probably added as a placeholder.
But, we do have strong intentions of being more cognisant of the environmental impact of the OCaml infrastructure. In particular the large cluster of machines that do things such as:
In terms of actions already taken, I believe (not 100% sure) that some deliberation has gone into choosing infrastructure providers whose environmental goals align with our own, @avsm might be able to provide more insight there. But I’ll also refer you to the section in the original roadmap v3.OCaml.org: A roadmap for OCaml's online presence.
Of course swapping the cluster to only use renewable resources is not a single solution, when one thing uses renewables it means something else cannot, so the question becomes is job X worth running at all? Which brings me on to the future goals (as I see it).
Future Goals
Taking decisive actions to reduce the environmental impact means having a much better understanding of the current environmental impact. This should actually be fairly achievable using the cluster management tooling (see OCluster) and knowledge of the types of machines all the jobs are scheduled on. With better reporting we can make decisions with measurable impact.
Some potential wins we could probably implement are:
Better caching and sharing of artefacts (there’s a lot of overlap in jobs like the health-check and the docs generation, what could they share?). Already the infrastructure tries to re-run jobs on the same machine to make hitting caches more likely which is a step in the right direction.
Sometimes jobs are re-run when maybe they shouldn’t be I think. Maybe we should be more opt-in for a rerun rather than automatically rebuilding lots of things.
Surface the reporting – there’s a lot of jobs running constantly that lots of people are quite unaware of I think (e.g. the health-check), the more use we can get out of the vasts amount of data the infrastructure produces, the better.
Surfacing the environmental metrics to users.
Without the metrics system giving us good ball-park figures it would be hard to know if any changes make a difference, so that in my mind is crucial. These are just some thoughts, happy to hear anymore suggestions and ideas.
Better caching and sharing of artefacts (there’s a lot of overlap in jobs like the health-check and the docs generation, what could they share?). Already the infrastructure tries to re-run jobs on the same machine to make hitting caches more likely which is a step in the right direction.
I would spend some time looking at Bazel’s remote caching as well as Nix’s remote caching. One could either use these systems directly or imitate their approaches within Dune and Esy.
If I understand correctly, the effort is (should be?) mostly focused on all CIs and automated builds?
Maybe an overview of best practices about OCaml development (CI, documentation generation, etc…) in terms of environmental footprint should be written and added to the v3.ocaml.org?
Quite right, this is by far the most energy-intensive portion of infrastructure. As a small example the health-check builds every version of every package against 12 compiler versions (at the time of writing). That’s a lot of building. It’s also very useful for “checking the health” of the opam package ecosystem and the caching works very well (e.g. http://check.ocamllabs.io/log/1631989667-3ee0d8336d6616a07c1bd28128e70f56cce623e7/4.08/good/0install.2.17 see the using cache for each build instruction which just shows the previous builds logs thanks to obuilder). Similarly the centralised documentation generation is building the docs for every version of every package. Again, I’ll stress, getting good metrics for energy usage would be very useful in quantifying all of this. The health-check and doc building are (hopefully people agree) good uses of our resources I hasten to add.
On a per-user basis I’m not sure there’s too much to add, but I’m very open to the idea of this, feel free to open an issue on v3 with some ideas.
Thanks @patricoferris for the accurate and detailed answer. In recent years, the size of the OCaml build cluster in particular has grown to over 1000 CPUs due to the number of architectures, operating systems and versions that we are building for. This is significant enough that I’ve been paying for reliable carbon offsets to help mitigate it.
When we were designing the new v3 website, there was a hole in the place where the “privacy policy” normally sits. The reason we don’t need a privacy policy is that third-party trackers and such have been purged, and so the natural dual to this was to focus on something more positive, such as firming up our policy on emissions and consumption from the infrastructure. I was very happy to see your question pop up on this forum and the followup discussion, to know that other people in our community also care
We’re still at an early stage, but some of the questions that come up are:
classify necessary and unnecessary emissions. An example of a necessary emission would be the compute load for features that are needed for the website and opam package builds. Unnecessary emissions would be if we are architecturally doing something wasteful (for example doing repeated package builds and throwing the results away.
how do we eliminate unnecessary emissions: unifying and normalising the data schema for the various services, adding caches to avoid repeat computation (something the existing cluster does but can always improve), ensuring we put good http cache control headers for immutable content to reduce server load.
classifying third-party services: we spread our compute load across multiple providers like Packet, Scaleway, EC2 and in-house hosting in the Cambridge Computer Lab and Inria. Each of these providers have their own power sources and carbon neutrality policy, so we need to assemble that data and ensure that we are as healthy in our power usage as possible. The cloud providers are typically all quite good about using renewable, but my own Cambridge hosting at the university is less good (something we are working hard on fixing!).
offsetting necessary emissions: once we’ve figured out the necessary emissions, we need to offset those against trustworthy projects such as nature-based restoration efforts. I’ve been doing that with a back-of-the-envelope calculation with the OCaml cluster for the past five years or so, but I’d like to know with this v3 refresh that I’m being vaguely accurate. Our CI infrastructure is good enough now that we could thread through ‘power usage’ markers through the logs that just record which host a job ran on, and figure out carbon outputs from there.
The good folks at ARM suggested having a blog post about our approach when we launch v3, to put on their “works for ARM” website and encourage other open source communities to discuss this. I’d be delighted if anyone wants to help with figuring out some of the above on the existing OCaml infrastructure – there’s a lot to do to tackle the climate crisis, even in our little bit of the Internet!
Thanks @patricoferris & @avsm for the info. As I’m working to “productionize” a SaaS prototype, just this week I’ve started looking into what it would look like to make it carbon neutral or negative…so this topic here is really timely.
While I’m new to the subject, it was clear from the start that power consumption metrics are definitely the biggest blocker (as you both point out). Picking “greener” compute regions and such is obviously better than not, but especially insofar as one uses shared, on-demand infrastructure (e.g. “serverless” services like Google Cloud Run, etc.), it seems really difficult to get tangible consumption numbers. In any case, insofar as you progress through your efforts in this area, it would be great to read up on the methods you use (and I’m sure I’ll be blogging a bit on the subject as well, FWIW).
If it’s a prototype just run it on your own solar panel like this version (click with care) of the low tech magazine :–)
Note that nowadays you can find a few vm service providers that do claim to run on green power without offsetting. It seems I can’t find a list I had at some point, but try to look in Germany.
I’m using this swiss one for a project, it runs on hydropower (though if you look into the details I suspect it may also somehow run a little bit on nuclear power as it seems downstream of this pumped storage system).
Very neat. Unfortunately, even before the service in question is generally-available, there are nontrivial data residency requirements that probably means sticking with the largest cloud providers so that the operational complexity isn’t overwhelming.
Thankfully, it seems that those providers are doing a lot better on this front than they were just a couple of years ago. Even aside from offsets, options are growing for being able to locate deployments in regions that use majority green power (even before offsetting). AFAIK, Google is the most transparent in this regard, as they disclose on a per-region basis the percentage of energy consumption coming from green sources, as well how much carbon/kWh their non-renewable power sources emit (Carbon free energy for Google Cloud regions). At least in Europe and the Americas, there appear to be some responsible options.
Totally naïve question: if the infrastructure components are migrated to Mirage, wouldn’t that also reduce the power consumption? (By saving hundreds of thousands of instructions executed by the operating system).
I know it’s a tricky issue, but it could also serve as a significant scale test for MirageOS.
Even though it’s interesting to compare the relative speed of languages in itself - I don’t think that this has any relevant role for energy consumption compared to the role of what programs are running and for how long.
On the contrary, it can be a dangerous angle on the problem - as it doesn’t take the actual energy consumption into consideration, so people can use it as an excuse and say “I’ve done my job in optimizing for energy consumption now”.
I myself am guilty of making processes in the past that recalculate cacheable results repetitively - because it was easier. This kind of programming is a much worse problem.
And then there is the harder problem of “should the code run at all?” - which a question that will be hard to ask at any company meeting - as energy-consumption is not a valued goal in itself, and the question takes the focus away from solving for the success of the company.
I do not agree that the article came to that conclusion. My point is about the comparative relevance of programming language speed vs. the programs running, independently of what language they are written in - specifically in the context is overall energy efficiency.
Though after further thought, I guess that programming language efficiency might become more relevant when observing extremely distributed systems. E.g. the JS that everybody runs all the time. Making that much more energy efficient I guess could have some observable effect