OCaml.org: recapping 2022 and queries on the Fediverse

Hot on the heels of the 2022 User Survey results, I thought it a good opportunity to look back over the 2020 results (and summary) and look at some of the highlights of what the ocaml.org team of contributors did in the past year for our ecosystem, and to gather some inspiration on what to focus on next in 2023. As always, these recaps from me are my personal distillation of our community’s work, with me just reporting as best I can. Errors and omissions are mine, and credit to the individual hardworking maintainers!

At the start of 2022, I communicated three priorities to the OCaml.org maintainer teams when asked about what to work on, based on the work of the core development team and the feedback from the 2020 user survey:

  • help the OCaml 5 release be a success
  • launch a new OCaml.org web presence with documentation
  • prototype new workflows for OCaml development

Help the OCaml 5 release be a success

The first thing I found notable (and reassuring) in the 2022 survey results is what wasn’t mentioned in the user responses: instability with the core system. We began 2022 drunk with relief from the multicore PR being merged, and began the year-long 5.0.0 release process. In one of the core developer meetings early in the year, I requested that the 5.0.0 release would be restricted to just the multicore runtime and effects features, since so much had changed there that we would have our hands full. This was promptly ignored and followed by a surge of PRs removing lots of legacy cruft that had built up over the course of the 4.x series. A recipe for release management disaster!

I’m glad to say that I was wrong, and (as you’ll see from the infrastructure report below) that the collective 5.0.0 release effort has been one of the most impressive I’ve witnessed. The core team signalled deprecations clearly, and the various tooling teams (such as opam and dune) in the ecosystem performed lots of differential builds and incremental releases to remove vestigial deprecated fragments that now broke builds. This was then followed by an engaged community releasing their various dependencies to the mainline opam repository, all in good time for the 5.0 release candidates to be cut and be usable.

We are, of course, still really early in the lifetime of the OCaml 5.x series, and some serious breakages may yet lurk and only be discovered as our users migrate. But we are at a point now where experimentation, prototyping and migration can be done in a controlled way across both OCaml 4.x and 5.x, so let’s pat ourselves on the backs for a moment about a job well done before moving on. I’ll ninja-edit this post if next year’s user survey is full of complaints about OCaml 5 :wink:

Launch a new OCaml.org web presence with documentation

The 2020 user surveys made it crystal clear that we needed to improve the state of art with documentation and OCaml. Accordingly, we started work in 2021 on a new site, previewed it in early 2022 and launched in April 2022.

This new website preserved older links (by redirecting them to an archived v2.ocaml.org) and provided a brand new centralised documentation site with package search and incremental rebuilds to ensure new packages get up there in a timely fashion. This was a complex task behind the scenes, since it requires ongoing bulk package builds of every version of every package in the opam repository, with consistent cross-linking.

It’s still by no means perfect, with some work needed on rendering glitches, missing sections, and the overall information architecture of how to present so much information to a range of users (beginner to advanced), but it’s a really solid foundation to work from (unlike the previous website, which was really showing its age). This year’s user survey continues to emphasise the need for advancing documentation, so I hope to see more contributions to the new website to ensure the content continues to improve. And of course, look to your own published libraries and ensure that your odoc markup is as good as you can make it.

Prototype new workflows for OCaml development

The other hot topic in 2022 was for us to figure out how to integrate better with modern, stateless workflows for code development. This is a complex topic for tool maintainers to work on, since we must also preserve backwards compatibility with existing workflows (witness the very high percentage of users in this year’s survey that use the opam cli as their primary mechanism of interaction). We also had a decent idea of who the various sorts of users are from discussion in 2019 with application developers, library authors and OS maintainers.

There have been a number of prototypes built this year to experiment with new workflows, but most of the effort from core maintainers has gone into the earlier priorities (releasing OCaml 5, especially). The prevalence of dune (>91% in the 2022 survey) means that one simple stateless workflow is to have all the source code available in one monorepo, and perform a dune build. This works because dune can scan all the dune files in the repo and build them in one pass. In this workflow, opam can be optionally used to assemble the source code (see the opam-monorepo) plugin, but our Nix-loving friends also have their own alternative mechanisms (1, 2, 3). While some projects like MirageOS and Real World OCaml are using these workflows, they are still maturing. Now that OCaml 5.0 is out and the new website is live, I hope to see more production quality workflows emerging in 2023.

There is also some concern that dune shouldn’t be a hard requirement on any workflow. This requirement has been successfully preserved to date, but is getting increasingly difficult to reconcile with the demand for a more opinionated, beginner-friendly workflow. With opam, our architectural answer to this is to separate the opam file format from the opam CLI, and make it easier to interpret opam repositories via external tools. The same discipline should work for dune files with alternative build systems.

OCaml.org and our decentralised future in 2023

How do we – as a community – figure out what will work for new workflows? That’s my segway into what I want to hear your thoughts on for 2023: how to incrementally improve community communication.

  • discuss.ocaml.org (Discussion forum): I setup this forum back in 2017 in response to a user request, and it has been a successful experiment. The number of OCaml users who report interacting via this forum has increased percentage-wise from the 2020 to 2022 results. I am also pleased that a small moderation team has been sufficient to deal with spam. Traffic-wise, we have had to upgrade the hosting capacity several times in the past 5 years as demand rises, and there has been a mild surge in page-views towards the end of 2022 with the release of OCaml 5.0.0.

    I am still regretful that I had to sunset the mailing list service, but it will hopefully be back in 2023, especially if we find a volunteer to help configure and maintain it.

  • watch.ocaml.org (Video sharing): The Peertube-based service began in the 2020 lockdowns when we shifted our workshops online. Since then, it has been a resilient and useful resource to host videos about OCaml related topics. There are some advantages to hosting our own videos: they can be permanently archived and linked to, and we can integrate well with other “Fediverse”-based services (more on this later).

    What I’d like to see in 2023 is more content being backfilled onto this site, so that we can have all the videos from the last few decades of OCaml conferences and workshops in one place! To that end, we will promote the service to non-beta status soon. It is a little tricky to use Peertube with multiple users (still involves password sharing), so we’re figuring it out as we go along and before finishing the promotion to production status. Please do leave comments and thoughts on that issue or here.

  • opam.ocaml.org (Package management): The surveys confirm that opam remains the dominant mechanism of installing and accessing the OCaml package ecosystem. In addition to regular releases of the opam tool itself, the backend infrastructure has been upgraded significantly so that the package archive should be more available and secure, and easier to mirror onto global CDNs in 2023 and also integrate with other software security supply chain software.

    Alongside serving the package archives themselves, we maintain a significant multi-architecture cluster of machines that perform the bulk builds to ensure the health and integrity of the opam repository (the curated package database). These machines comprise of x86, ARM, PowerPC and RISC-V machines (yes, we did finally get rackmounted RISC-V boards this year!). The machines are variously hosted at the Cambridge Computer Laboratory, Inria, Scaleway and Equinix, and we are sunsetting our use of AWS for cost reasons. Individual machines have been generously funded by IBM, Tarides, Jane Street, the Works on ARM program.

    The software driving this cluster continues to grow, with support for Windows and macOS builds going live in 2022. These are not yet hooked up to the live opam-repository-ci, but will hopefully be back in 2023. This marks our migration away from hosted CI services such as Travis and AppVeyor, and the backing infrastructure is open source and possible to deploy for yourself (e.g. in an industrial context).

  • www.ocaml.org (Website and Docs): I’ve talked extensively about the new website earlier, but I would like to emphasise the importance of receiving external contributions to the content of the website. The repository is open for PRs, and it has been a little quiet in the latter half of 2022 outside of a small maintainer team. If you’d like to get involved, then please feel free to open an issue and discussing your plans, or signal any blockers you encountered. Gabriel has already noted bottlenecks in the core OCaml distribution, and a similar story is playing out in the wider ocaml.org ecosystem. We need your contributions.

  • git.ocaml.org (not launched): a service that we have considered since 2019 and did not launch is a git mirror, or an alternative way of procuring OCaml ecosystem source code than from GitHub. There have been a small but steady stream of requests for this, with several motivations: availability (GitHub is a central point of failure), security (replicating ecosystem git branches is sound and secure practise), and privacy. The core OCaml team is firmly committed to GitHub at present, but launching a read-only mirror is in scope for 2023 if a maintainer is willing to step forward and survey available solutions (ideally not as heavy-weight as GitLab) for mirroring scripts. Once we have a robust read-only git mirror, we can begin to consider how to accept patch contributions (particularly to the opam repository) via email or other mechanisms, but no promises until we reach the first read-only milestone.

    I’d really like to hear from industrial users who have stronger requirements for secure software supply chains here as well. I participated in a White House summit on software security earlier in the year, and it is clear that this is going to be an important topic for OCaml to keep up with in 2023, especially with our role in the formal verification ecosystem.

Should ocaml.org host more Fediverse services?

I’ve mentioned the Fediverse earlier, and could use a wider set of opinions. One of my concerns from the user survey is how much interaction happens on closed synchronous mediums such as Slack or Discord. I’m not against such platforms (1-1 and small team private chats are not replaceable), but there’s currently no way to then promote knowledge gathered from the closed systems into the public commons, where they benefit newcomers. And more recently, there has been drama around centralised services such as Twitter that throws its permanence into question. Our user survey indicated positive vibes about our current interactions, and I of course want to ensure any of our technical platform choices support this healthy growth.

The Fediverse itself is a fairly loosely arranged set of services that interoperate via two main protocols: ActivityPub for web-based services, and the Matrix chat protocol for encrypted 1:1 and group encrypted communications. Some potential services we could host are:

  • Mastodon is a micro-blogging platform which can be run on several domains. It exposes feeds via RSS as well as several open source clients. Fediverse clients can interoperate: a “boost” on a watch.ocaml.org video can be expressed in a Mastodon timeline, and a “favourite” of a video in Mastodon will increment the “like” counter on the videopage.

    For ocaml.org, a simple service to run would be an activity feed (e.g. from the opam repository and the website blog) that would publish “Toots” and make them searchable across the wider network, but not allow user registration. This would sidestep the need for moderation and selection of blocklists. However, the hard work of the code of conduct team means that we have the basis for user registration provision as well (especially by using discuss.ocaml.org as a single-sign on backend). Opinions welcome – by default, I will select the conservative option of adding read-only ActivityPub to ocaml.org directly, as we do not currently have the moderation resources for a full Mastodon instance, and it can be upgraded at a later stage.

  • Matrix chat is already sitting alongside the venerable IRC as an open alternative. One of the nicest features of Matrix is that multiple servers can publish the same room, and the domain name is simply a namespace which can be replicated. For example, we have a chat room for the Eio library that is published as #eio:roscidus.com (@talex5’s Matrix server) and #eio:recoil.org (my own). In the future, this could also be #eio:ocaml.org simply by publishing it as such. The value in an ocaml.org Matrix server is thus to act as a conveniently searchable directory, with the room contents being replicated in various other homeservers for availability.

There are still significant downsides to using the Fediverse as opposed to centralised services. Usability is patchy, availability can be variable as some servers go down while others remain, and moderation is never a fully solved problem that requires distributed maintenance of blocklists. We’ll need to be open to some experimentation and failures if we step further in this direction, but it is promising.

In this spirit of experimentation, the ocaml.org changes are all now being recorded on a blog (infra.ocaml.org, and at Issues · ocaml/infrastructure · GitHub), and I’ll begin discussions with ecosystem maintainers about how they feel about moving to slightly more open platforms. In the meanwhile, nothing stops independent initiatives. If you feel the urge to continue developing ActivityPub bindings (begun by Kate at a MirageOS retreat but in need of a new maintainer) or bringing the OCaml Matrix implementation to production quality, now would be an excellent time to do so!

None of these Fediverse services are intended to replace the excellent roundups often seen on these forums (such as Gabriel’s compiler newsletters) and via the Caml weekly news. If in doubt, feel free to step up with your own projects and post about them regularly here!

Finally, the absolute highlight of OCaml in 2022 for me has been the continued support for Outreachy from our maintainers (see posts and even a video roundup). This effort, along with the code of conduct process concluding, highlights the enthusiasm for bringing newcomers into our world. I encourage the senior members of our community to try to participate (even if just once), and get in touch with myself or the OCSF if the bottleneck is something we can help address (like funding).

I’ve never been more excited about the future of OCaml than I am heading into 2023; a whole new realm of systems programming has opened up with the release of multicore and effects, and it’s just really fun and a privilege being along for the ride with such talented collaborators. Happy new year everyone! (I’m currently snowed in somewhere very remote, and am only 50% sure this will make it through to the forum. Please please, don’t give me a HTTP error when I click on ‘create topic’) :slight_smile:

40 Likes

I know I’m no maintainer and this is no survey but please consider sourcehut! It’s lean, easy to maintain, and very email-friendly! It could even double as the new home for ocaml-list.

8 Likes

I’m a big fan of openness by default and I share this concern. I’ve been using Twitter for 10+ years because it favors transparency and I recently started using Mastodon as well. Many people don’t get the value of transparency by default (and I must admit their lack of understanding scares me at times), so the only thing I want to say is that offering a way for folks to communicate privately is essential if we want them to use the communication platform. I believe Mastodon just like Twitter offers such options. I believe we’d have to make sure these features are advertised properly if we don’t want to face rejection.

DMs on both Twitter and Mastodon are technically accessible by the owner of the server. I don’t have confidence in either to communicate privately.

1 Like

why?

(discord doesn’t like me so terse)

because not all conversations should be public. We don’t want to use another platform for private discussions because it’s inconvenient. There are lots of OCaml programmers out there that we never hear about because they just use their favorite messaging platform which doesn’t allow public sharing.

On that topic, I’m wondering if private messages on this forum are actually ‘private’, or are they readable by the admins?

Who is ‘we’?

And convenience and privacy rarely go well together anyway, do they? I’d rather have a public place not mixed with military grade private messages – like the porcelain in kitchen and bath are better separated. Having private and public a single click away sounds like a recipe for disaster.

Which ‘platform’ do you mean btw? The fediverse has no concept of such AFAIK.

thanks for this recap! Just wondering, in making all these decisions, were environmental issues taken into account (eg. energy consumption)?

3 Likes

Energy Usage

I punted on this in my recap as I hadn’t had a chance to catch up with @patricoferris about it, but since it’s a very important topic let’s start talking about it incrementally now instead of waiting for that!

In the redesign for the new site, we explicitly removed third-party trackers and took advantage of the spare screen space (usually reserved for a privacy policy, now no longer needed) and put in a OCaml.org carbon footprint statement as a placeholder until we obtained more specific data.

Later in 2022, @patricoferris investigated how we could do better in terms of power monitoring, and is developing a suite of OCaml tools that will hopefully be useful to the wider community as well:

  • Variorium collects hardware-level power counter information, for accurate monitoring
  • Carbon Intensity is a tool to integrate with country-level APIs for where energy is primarily coming from, in the absence of more specific information from the datacentre provider.
  • Clarke combines all this into a convenient Prometheus monitoring, for centralised analysis.

These are all still unreleased, and I’ve opened a tracking issue about the deployment of these into the ocaml.org cluster. If anyone would like to help out (particularly around finding more accurate APIs for carbon intensity) then feel free to open issues/PRs on those various repositories.

Some services, such as restoring inbox.ocaml.org are a little blocked on this topic as I’m reluctant to provision more long running virtual machines without thinking through more efficient alternatives that can consolidate services (e.g. have just one SMTP endpoint instead of multiple). My apologies to @xavierleroy and @nojb for the delay, as they have both done a bunch of work towards restoring it already, and I’ll do my best to catch up this month.

Privacy

The only digital communications mechanism that we’re using that features end-to-end encryption is Matrix. That implies that, as a general rule, that most of the alternatives such as Slack, IRC, Discord and Mastodon, do allow their respective admins to read your messages. Discourse (the software powering this forum) has explicit support for admins to monitor private messages for online safety reasons, although to my knowledge this facility has never needed to be used for this deployment.

If you want a reasonably usable mechanism for private messages, then Matrix is the way to go, including for encrypted group channels, and all the other services are one security breach away from going public.

As for the discussion about openness, I’m personally not really a believer in being radically transparent when doing open-source work. I find it really difficult to focus on a topic when in the public eye, and instead prefer to work on it with my immediate collaborators and then have an open discussion about it. What I really miss is the ability to promote information that results from the private discussions into a more open forum – all these recaps and newsletters are entirely written from scratch, and the inefficiency means that it’s a huge amount of effort to get right. It’s easy to put the time in with full papers since there is a reward structure (for academics, anyway) in place via the conference circuit, but less so for other mediums. A project I’m going to return to sometime this year is Bushel, where I’ve been prototyping a communications format suitable for iterative promotion and integration with data scrapers.

Source mirroring

To be clear, there’s no special ‘maintainer bit’ or survey required to give your feedback – a maintainer is just someone who puts the time in to help out with a particular area. For example, we got brilliantly helpful external feedback for the opam archive migration here just a few days ago.

I do like SourceHut a lot, but we’d ideally self-host it, and that’s quite a bit of work due to its microservice architecture. It should be possible to strip down the services (remove the autobuilders and bug trackers) for a read-only mirror, and so a good way to contribute would be assemble a Docker compose file with such an installation and demonstrate how it might work with a sample set of Git repositories to mirror. If you (or anyone else reading this) wants to have a go, feel free to create an issue on Issues · ocaml/infrastructure · GitHub with your prototype.

8 Likes

Just a note that GitHub is partially blocked in India by a local ISP: GitHub content domain blocked for these Indian users: Reports - Times of India

The ISP claims that it is due to a court order.

Chiming in to say that I really appreciate that the team is emphasizing the importance of not relying solely on closed proprietary services like GitHub and Discord – it’s okay that they are the primary sources to attract new users easily, but part of being open source is carrying the values of FOSS, which is that a FOSS community should be hosted using FOSS technology (such as this forum, email, IRC, Sourcehut, etc.).

I second the motion to use Sourcehut, though the requirement for self-hosting might be a wrinkle. They are very friendly to open source projects, so I wouldn’t be concerned about hosting on the flagship instance. Otherwise, I think asking around on IRC would be a good start (thanks to said friendliness).

4 Likes

Some perspective:

That University of Lancaster paper is, afaics, projections on top of estimates on top of guesswork. Reminds me of economics papers.

Even in the optimistic scenario that the paper is accurate, ocaml is a niche language with a small community and a small ecosystem. If the entire ocaml ecosystem were rejigged to use a fraction of energy that it does now, I can’t see that the overall impact would be any more than negligible, when compared to other ICT energy usage.

2 Likes

right, each person on this planet is a tiny fraction of the global population (really, 0,00000001%)
so whatever they do is negligible for the overall impact :wink:

EDIT: this was ironical, sorry for not being clear enough

3 Likes

Thanks, @avsm, for your detailed message, although I wish you had kept the Fediverse for a separate post, as it inevitably draws the discussion into platitudes about social networks and everyone’s favorite communication channels.

Focusing back on the ocaml.org website, I have two specific wishes:

  • Having online documentation for OCaml packages is a huge progress. Yet, a Google search for a package name (e.g. “ocaml cryptokit”) generally lands on an opam.ocaml.org page (e.g. opam - cryptokit) that does NOT point to the documentation, instead of the ocaml.org page (e.g. cryptokit 1.18 · OCaml Package) that does link to the documentation. Why do we have two different pages with different contents for the same package? Could we merge them in one page that contains everything there is to know about that package?

  • The “Books” page (OCaml Books) is often out of date and needs more maintenance. For example, it ignores version 2 of Real World OCaml and still points to version 1 :slight_smile:

9 Likes

Ok. Following your logic - since it has negligible impact, why is @avsm wasting time on it when he clearly has too much on his plate already? :thinking:

Yes - once the new deployment is up, we intend to retire the opam.ocaml.org package index completely (by retire, I mean it’ll redirect to the new package index on the main site, where the documentation is included).

1 Like

Trying to take this comment in a more fruitful direction…

I agree that OCaml’s share of the computer carbon footprint is negligible. What then is the goal of pursuing this direction? Is there a demand among government users for these features? Is there commercial demand for this niche where OCaml could fill a role? Or is it because it’s a research area that is currently underserved?

2 Likes

I updated the RWO entry and added Michael Clarkson’s brilliant book from Cornell’s CS3110. We finished importing in the last of the OCaml Workshop videos yesterday so they should be up-to-date from 2012->2022 now when ocaml/ocaml.org#112 is merged.

The papers could use significant backfilling (there have been a lot of OCaml related papers in the last 20 SIGPLAN conferences and affiliated workshops). @octachron suggested BibTeX import for that which should make it more practical.

In general, more hands make light work of this data maintenance. There’s a good CONTRIBUTING.md for the ocaml.org site, so please do all of you look at it and see if you can help improve the job board, or success stories, or papers/books/anything else in there.

Tracking issue is now up at ocaml/infrastructure#26. It covers a few low hanging fruit things we can do straight away that’ll improve the situation.

We are doing it because it is the right thing to do at every level when we are in the middle of a climate crisis. At an individual and organisational level, we all need to reduce our emissions footprints by avoiding wasteful consumption. As computer scientists, we need to develop tools to help society reduce our emissions footprints. As purchasers of computers and materials, we need to apply pressure on our vendors to reduce their emissions footprints and ensure they have responsible supply chains that adhere to good recycling and zero deforestation commitments. As consumers of cloud computing resources, we need to ensure they are minimal in their energy footprints and use of renewable power. As architects of distributed systems infrastructures, we need to ensure we engineer facilities to accurately record our emissions. As members of the global open source community, we need to set an example of best practises that may be replicated more widely.

We learn by doing, and I am very disappointed that the responses here so far include no enthusiasm or encouragement to the interesting OCaml libraries by @patricoferris that I linked to. All of those are reusable more widely, and have obvious applications in reliable energy monitoring (in e.g. embedded systems) in the real world.

And for those unmoved by the burning planet argument, there’s an even easier one. The best way to make computers go fast is to do less work, and that’s what all the disciplined tracking and reproducible infrastructure forming around ocaml.org does: makes it all go faster. Weren’t a bunch of you complaining that the opam package database sometimes took a day to update? Well, it’s rebuilding entire documentation universes in that time now, and package updates should take minutes once we finish the opam2web migration :slight_smile:

20 Likes

Honestly when the Greenlab’s programming languages paper was published and I first heard about it, I was passionately hoping it’d sparkle a line of research on the topic, and help start the development of ergonomic tools which bring this concern directly into everyday developers’ workflows, like using gdb or valgrind or perf or sanitizers…

The key to energy efficiency, I feel, at the grain level of a programming language anyway, is to reduce memory traffic and opportunistically fall into SIMD as an optimization and/or intrinsics exposed to the programmer. Both these things need some work in our functional corner of the programming languages world I think. That’s part of the reason I’m excited for the work on local and unboxed data in OCaml.

But really, this is probably a lot more of an important concern to sysadmins, I hope to see a world where energy monitoring tooling is ergonomic and popular…

4 Likes