Elevator Talks: A Glimpse into My OCaml Side Projects (dev tools)

mbarbin · June 11, 2024, 10:13am

I’d like to share some insights into a few projects that I’m currently developing in OCaml. These projects are a blend of my interest for development tools and my desire to contribute to the community. I hope this overview sparks your interest and opens up opportunities for discussion or collaboration. Let’s dive in!

Dunolint

I’m working in a monorepo that contains hundreds of dune files. It’s not massive, but that encouraged me to create a tool to check invariants in my dune setup and assist with ergonomic issues, such as applying automatically systematic changes across many dune files. It supports things like enabling instrumentation, configuring recurring lint or preprocess flags, sorting libraries alphabetically, etc. Recently, I’ve added support for dune-project files, enabling me to automate workflows like updating dependency bounds for multiple projects within a monorepo simultaneously.

The code is currently in a rough state. I’m using it as a buffer to experiment and iterate quickly without needing to consult the dune developers for every minor ergonomic query as I learn more about dune and my requirements. At this point, I’m uncertain whether to consider it as throwaway code or whether it would be worthwhile to refine the scope and user specifications for a project of this nature.

Central

I appreciate the monorepos approach, but I prefer to publish projects separately, sometimes with varying visibility levels (e.g. public vs private repositories on GitHub). I was looking for a workflow that would let me combine the benefits of working with monorepos, while allowing bidirectional promotion of changes between the monorepo and the individual sub projects contained within.

I explored git submodules a bit, but I needed more flexibility in editing the history between the internal operations (within the monorepo) and the final published commits in the individual sub projects. I found it more convenient to version the subprojects as part of the monorepo itself. This approach aligns more closely with git-subrepo, which I used to build some early versions of this project. I’m gradually moving things to an OCaml implementation.

Diff4s

Diff4s is what you get when you rebase a branch under development. You started from a specific upstream revision (old-base = b1), reached a working HEAD (old-tip = f1), and in the meantime, the upstream moved to (new-base = b2). After performing your rebase or merge and resolving conflicts, you end up with a (new-tip = f2).

In this blog post, Yaron Minsky discusses patdiff4, a tool that manages diff4s in Iron, a code review system used at Jane Street.

I contributed to early versions of patdiff4 and Iron. However, it’s been some time (I can’t believe this post is actually 10 years old). It’s been a while too since I’m no longer working at Jane Street.

I recently started regaining interest in this topic and I’ve started developing a library that computes and manipulates diff4s for git repositories. My goals are:

To create a standalone tool that can aid in reviewing complex rebases you might encounter locally.
To develop a library that I can incorporate into a more comprehensive code review system for git, inspired by Iron (see the following paragraph).

I don’t plan to focus heavily on rendering issues. Instead, my aim is to design the tool in a way that leverages the user’s git difftool and mergetool configuration, along with other custom strategies and third-party tools. (As an example, I recently learned about git range-diff, which seems to render some diffs-of-diffs, and read up some ideas of side-by-side rendering for it).

Cr

As previously mentioned, I worked on Iron and used it daily during my time as a developer at Jane Street. For those unfamiliar with Iron, I recommend this public talk.

Nowadays, my development primarily involves git repositories, using a PR model and various GitHub features like CI via workflow actions, etc.

Occasionally, I find myself missing certain aspects of Iron, although I haven’t precisely identified what those aspects are. I’ve often contemplated whether elements of an Iron-like workflow could be adapted to decentralized development on platforms like GitHub.

Over the years, several people familiar with Iron have considered similar ideas. In 2018, James Somers wrote a blog post envisioning what this could look like, viewing it through the lens of the editor integration.

I’ve begun prototyping a system that operates with git and supports diff4s. It tracks what you’ve reviewed in each branch and what you need to review when branches change. It also presents you with an aggregated “todo” dashboard across all your repositories, regardless of where they’re hosted.

I take great pleasure in acknowledging ‘Iron’ as a source of inspiration for my ‘cr’ project. However, it’s worth noting that due to the unique distributed nature of ‘cr’, the architectural similarities with ‘Iron’ might be minimal. ‘Iron’ is a centralized comprehensive system with numerous features, some of which I’m probably not even aware of given how long it’s been since I last used it. In contrast, my aim with cr is to create a somewhat minimalistic layer to assist in tracking review states aggregated from many sources, with the intention of making it accessible to a wider audience. I anticipate that the two systems will evolve independently, without maintaining any specific ties.

Currently, I’m using cr to monitor the progress of branches I’m interested in across numerous public git repositories, many of which belong to the OCaml community. This has been an enjoyable experience[^1].

At the moment the review metadata is persisted into a local git repository. I wish to redirect some of this information into the git repositories under review to enable collaboration and sharing of branch metadata (e.g. some json files pushed to a dedicated branch). I plan on using CRDTs for this part.

An Example Combining the Tools

Let me share how I recently combined these tools to effectively upgrade my code base to the new v0.17 janestreet opam packages.

First, I created a cr-branch in my monorepo where I used dunolint to automatically edit all dune-project files in my project. I made all the required changes to make the tree compile and reviewed the changes with cr. Then, I used central to automatically distribute the changes to each public repository managed via my monorepo.

Finally, I relied on diff4 to assist with rebasing other changes I had in progress across this upgrade.

In Conclusion

I hope to gradually make progress on this over the coming months, identifying and publishing reusable building blocks along the way (e.g. git typed api).

Embarking on this project has been a journey outside of my comfort zone. I’m not particularly experienced with open-source development, and this endeavor is shaping up to push me into new territory. Isn’t this where the magic happens?

I’m developing these projects as a part-time hobbyist without external funding. My prototypes are incomplete, flawed and not ready for public use yet. That being said, I’m open to early discussions and am interested in similar work happening elsewhere. If you have overlapping use cases or motivations, I’d love to hear from you! You can reach out to me here, or at any of the email addresses attached to my commits on GitHub.

Best regards, Mathieu

[1]: I leave you with a demo of a cr session in the terminal:

cr-terminal-session

Khady · June 11, 2024, 1:32pm

I haven’t checked your project yet, but my impression is that it’s definitely something worth keeping alive. We started to develop something with a similar spirit at ahrefs, which is still in its early days too. And I suspect that some other companies would benefit from such checks.

Central is also a tool that looks super useful. We have an internal monorepo and a bunch of small open source repos. Integrating all of those together is not the most pleasant experience.

Thanks for sharing!

mbarbin · June 13, 2024, 11:54am

@Khady it’s encouraging to know that you’re also exploring similar ideas. Thanks for sharing!

I’d be interested in comparing notes if you’re open to it.

Here’s an example of how I use dunolint to edit my ppx config across the tree. (Thanks for the new flag, @NathanReb ! I plan on putting it to good use.)

dunolint-terminal-session

(Note: It’s referred to as ‘dunoscope’ in the gif due to pending renames.)

At present, I’ve designed dunolint as a CLI that’s parameterized by some OCaml files in my tree. A potential enhancement could be to transform it into a distributable binary, with a different means of configuration (maybe an atdgen’d JSON-able config could be a potential direction? I’m still exploring options).

It has some potentially reusable dependencies. For instance, the patching engine it uses under the hood is a textual substitution library I’m working on, drawing inspiration from techniques discussed here. It can preserve elements like comments or other parts of the dune files that the tool doesn’t parse.

I’ve been referring to dunolint as “throwaway” code under the optimistic assumption that:

          lim                    features(dunolint) = 0
ecosystem -> even-more-awesome

In other words, perhaps this kind of functionality could be integrated with existing tools such as dune build @lint -w? I’d be more than happy to participate in discussions with the dune developers too if there’s interest in this.

Edit: To accommodate different preferences for discussion, I’ve created a GitHub project for dunolint. Feel free to join the conversation and share your thoughts in the platform that suits you best.

Khady · June 13, 2024, 2:40pm

Those are the very early days of the project for us, and it’s pretty specific to our setup, so the code is not ready to be shared. But I think we’d be open to at least share our approach and idea. I’ll message you once we have made a bit more progress if that’s ok for you.

Topic		Replies	Views
Dunolint status update Ecosystem dune	2	200	February 13, 2025
Explorations on Package Management in Dune Community announce	31	3003	July 20, 2023
Awesome Multicore OCaml and Multicore Monorepo Community multicore	5	2390	September 26, 2022
[BLOG] OCaml linting tools and techniques Ecosystem blog , ppx , dune	2	610	May 4, 2024
[ANN] Dune Developer Preview Updates Ecosystem opam , announce , build , dune	60	5296	April 8, 2025