When a popular package with many reverse dependencies introduces breaking changes, what practical options exist for the community to manage the transition?
An example will be illustrative: in the recent release of ppxlib 0.36, a change to the internal AST required PPX authors to use a different representation for certain AST nodes. Based on the description of the changes, it sounds like a PPX may have compatibility with ppxlib 0.35 or 0.36, but not both (do correct me if I’m wrong).
Since ppxlib has over 200 reverse dependencies, it will obviously take time for all of those to see updates. For example, here are some packages with disjoint constraints on the ppxlib versions they support at this moment:
Is there a supported way for a project to take a dependency on both of these during the transition? My mental model was that all units of a compilation need to see the same interface to any module dependency X, and that this would prevent different package versions from coexisting in a project.
This post is not intended as a criticism of any package maintainers, nor as a debate about when breaking changes are appropriate. However, I am curious about ppxlib in particular since it’s part of The OCaml Platform, and therefore it sets an example for best practices in the community.
Without invalidating the more general question you are raising, the ppxlib maintainers opted for an “upgrade the world” approach to compatibility. Ppxlib is in a specific spot where this seems to make sense. (I do not know how it turns out in practice from my personal experience, maybe you are reporting that there are issues with this approach.) But we cannot take ppxlib as an example that can be transposed to the rest of the ecosystem.
Thanks @gadmm, I recall that post now that you bring it up.
As to how well “upgrade the world” is working out in practice… all I can say is that I didn’t have to look hard to find the version conflict in my example above. I noticed a few others when upgrading packages for one of my projects.
I wonder if there’s a way to query OPAM metadata for real numbers on this. E.g., what percentage of ppxlib’s reverse dependencies have a version constraint like “>= 0.36.0”. @kit-ty-kate, is this something that you might know? Are snapshots of the database made publicly available?
P.S.: I’ve updated the thread title to better reflect the scope of discussion.
I wonder if we could have a semi-automated way to add upper bounds (only for deps syntactically without upper bounds) to dependencies of packages in the opam repository, at dependency versions where a dependency introduces breakage.
This is manual, but it is already happening. The issue raised above, if I understood correctly, is that now there are no versions of some packages supporting the new library, making it (at least temporarily) impossible to upgrade in some cases
The issue raised above, if I understood correctly, is that now there are no versions of some packages supporting the new library, making it (at least temporarily) impossible to upgrade in some cases
To build on this a little bit: it might not be possible to use a new package version which contains a critical fix, or which resolves a different versioning constraint elsewhere. Of course, the likelihood of this depends on many factors including the number of dependencies in a project.
A second practical issue applies to OCaml newcomers, or to anyone who just wants a sane build environment without a lot of fuss. Continuing the example from the top of the post: if I ask opam to install both ppx_sexp_conv and bisect_ppx in the same environment, it will correctly choose a slightly older version of ppx_sexp_conv, since ppxlib 0.35.0 is compatible with that and with bisect_ppx. But that’s only because bisect_ppx has a version constraint on ppxlib that correctly excludes 0.36.0. Many other packages just specify ppxlib >=[some old version], because that constraint was correct at the time the package was made. (This is where @lukstafi’s comment about revising dependency metadata comes in.)
Edit: on review, I see that issue issue #2 has been well covered in the bug above. Sorry for the rehash!
Speaking with my ppxlib maintainer hat on, this is indeed an unfortunate situation. As others have pointed out, this is a particularly hard problem for ppxlib. The current design means if we want to offer ppx authors with the latest upstream OCaml features, we end up exposing them directly to the parsetree of the language. We are doing our best to patch reverse dependencies.
In theory a package could have compatibility with both, but in practice it is quite unlikely. Most packages pattern-match on AST nodes representing functions which is by far the most likely thing to break as that AST node completely changed in upstream OCaml (Pexp_fun disappeared and Pexp_function changed completely to represent all kinds of functions).
Thanks @patricoferris for your response and for that PR. I do respect that all engineering projects have tradeoffs and that the ppx team has spent a lot of time thinking them through.
Automation would be great, but I also wonder if this is something that the community at large could help with. It seems like most metadata updates in the opam repository are written either by package maintainers or OCaml platform maintainers. But if the opam team is happy to accept PRs from non-maintainers (and advertises that fact), maybe it would help reduce the breakages that slip through the cracks.
Here’s a more ambitious idea that starts with an observation: for timely resolution of breakages, it could be much easier to modify a package’s build flags than to patch the source. Suppose that a popular package with module X has breaking changes, but that it can provide the legacy API under a module XCompat.V1.X for a few releases. Then other packages awaiting V2 API updates can still use V1 for a while if -open XCompat.V1 is appended to their build flags. To actually implement this build change, I imagine that opam would need a way to store the flags in its package metadata and pass them on to the compiler.
Obviously, supporting two APIs in parallel does not come for free, and “update the world” may well be preferable in cases where breakages are limited in scope. And the changes to opam would have significant implications (i.e., treating opam not just as a database of packages, but as a distribution of packages that have been tweaked to work well together, a little bit like Haskell’s Stackage). But here again, the beauty of a low-overhead way to fix API breakages is that the effort can be more easily crowdsourced.
You are raising perfectly valid concerns. When we initially decided to use this “update the world” approach, the AST was much more stable than it has been over the last few compiler releases.
The main problem here is that, though we do send patches upstream, we do not control all reverse dependencies and cannot ensure that compatible versions are released on time, we’re merely “helping out” but this doesn’t guarantee the stability of the ecosystem.
Now might be a good time to think of a new approach and a slightly more stable API. I’ve been thinking about it and I would like to submit a proposal to the community soon.
Thanks @NathanReb. Your message made me think of a recent experience I had with ppxlib that I thought I would share.
While writing a ppx extender, I wanted to avoid the maintenance issues that come with depending on a specific version of Parsetree, but without having to sacrifice pattern matching in many places. The approach I chose was to define two intermediate representations (one for expressions and one for structure_items) and convert each relevant parse tree node into IR form early in the parsing flow, via Ast_pattern. Most of the IR variants keep references to the original parse tree nodes, but as opaque values for use during the expansion phase; anything I need to know about the structure of the node is encoded in the IR itself.
This was helpful in multiple ways: I could pattern-match on these types without worrying that the constructors would change out from under me, and the match cases were easier to write (because the IR types are much simpler than corresponding Parsetree types).
Using IRs to break up complex transformations isn’t a new idea of course. And yet, as others have pointed out, it’s common for PPX implementations to just pattern-match on the parse tree directly. I don’t know the reasons for that, but I suspect that one factor is the up-front cost of setting up an IR, even if it pays off in the long run.
So, since the discussion has turned to possible changes in ppxlib: I wonder if there’s some way for the API to gently nudge developers toward expressing parse results with their own IRs, as a way of weaning them off of direct Parsetree dependencies for pattern-matching.