Handling future AST changes in ppxlib
The OCaml 5.2 compiler release has introduced changes in core parts of the AST types. Reflecting those changes when we bumped the internal AST used by ppxlib in 0.36.0 caused breakage in a lot of reverse dependencies. Despite our efforts to keep the ecosystem up to date, it has lead to a split in the opam universe between packages that are compatible with 0.36.0 and above and those that aren’t.
Looking at the 5.3 and 5.4 AST changes, we cannot reasonably keep the same “update the universe” approach going forward I would like to propose a slightly different but much more stable and sustainable approach.
I think it’s important to have a bit of context on why ppxlib is designed the way it is and how we’ve been handling new compiler releases over the past few years to understand this new approach and how it’s going to improve the situation.
The next section of this post will summarize this. If you’re already familiar with ppxlib’s history and design choices, please skip ahead to the Proposed Approach section.
Ppxlib and compiler releases: How it works today
Ppxlib internal AST
Before ppxlib there was ocaml-migrate-parsetree. OMP had the advantage of providing a stable API for ppx authors. Each ppx would select a fixed version of the AST and be implemented as a full AST rewrite, i.e. a structure -> structure or signature -> signature function.
This had the advantage of making ppx-es forward compatible as omp maintainers would add support for new compiler releases in the form of a new module containing the AST types for this version and migration functions to convert to/from the types matching the previous compiler version.
OMP also came with a ppx driver, i.e. a program in charge of applying a set of ppx-es on a given AST or source file and spit out the final preprocessed AST for the compiler. The driver was responsible for migrating the AST from the compiler’s version to the one used by a ppx. Because each ppx could require a different AST version, it also potentially had to migrate the AST transformed by a ppx before it passing it on to the next one.
This had a few disadvantages though:
- poor performance as the AST was traversed and migrated (i.e. copied) several times through the course of a single driver run.
- transformations semantic issues: the order in which ppx-es were applied was uncertain or rather tied to the set of ppx-es version used. That meant that updating one ppx could change its “turn” and result in a different AST returned by the driver. This also did not allow ppx-es to interact together reliably.
ppxlib aimed at fixing those issues by forcing ppx-es to agree on the AST version to use. ppxlib provides its own, fixed AST version that ppx-es have to use. Its driver handles the migration to/from the current compiler and provides a smooth API to write transformations as rewriting rules. The driver then handles the AST rewrite by recursively applying those rules in the right places in a single AST traversal.
Support for new compilers
Support for new compilers comes in two stages.
Build and preprocess old code with new compiler
This is the most basic support, that is making sure that one can still build and preprocess its code using the newest compiler, provided they don’t use any of the new language features.
To do this, we add the new AST types and migration functions, just as OMP used to do. This does not allow new features in the code because those cannot be represented with the old AST types and the migration would fail (This was also an existing limitation of OMP).
This is usually released early on, when the compiler is still in beta and is a non breaking change, all reverse dependencies still build with this new version.
Support new language features
To support new language features, we bump the AST used by ppxlib. This means new features don’t have to be migrated anymore and are therefore supported.
This does change types that are exposed as part of ppxlib’s API and can cause breakage in reverse dependencies, depending on which part of the AST were modified and which part each individual ppx uses explicitly.
We provide tools that can help make ppx code more robust as they allow matching over and producing AST nodes without explicitly referencing the types themselves: metaquot, Ast_builder or Ast_pattern for instance. That’s not always enough though and eventually, those ppx-es have to be updated to be compatible with the latest version of ppxlib.
As was the case for the 5.2 AST bump, when we release such a ppxlib version, we send PRs to help maintainers of our opam reverse dependencies update and carefully add upper bounds to the versions that aren’t compatible anymore.
This worked pretty well for a few years as the AST was relatively stable and the parts that were modified were not directly used by a lot of ppx-es.
A problem with this approach is that even though we can help maintainers go through the update, we cannot release packages in their stead which means that unmaintained ppx-es aren’t compatible anymore no matter how much effort we put into easing the upgrade. It is also often the case that not all ppx-es have a compatible release straight away and this results in a transition period with the opam universe split mentioned in the introduction.
Proposed approach for 5.3 onward
The first part of this plan is to freeze ppxlib’s internal AST for each major versions. That means that until we release ppxlib.1.0.0 our internal AST will always be the 5.2 AST.
The second and most important part is to provide complete forward compatibility despite the AST freeze. We will allow migrating new features down to our AST by encoding them inside specific language extensions and migrating them back to their original form before returning the preprocessed AST to the compiler.
This will allow existing ppx-es to be used with new compilers AND to be used in the same files as new language features as long as they don’t have to directly interact with them without being updated in any way.
We will also provide a stable API to allow ppx-es that would like to add special support for these new features to build and match over such nodes.
You can take a look at the examples below to get an idea of what that would look like for recent language features such as the effect syntax from OCaml 5.3 or the bivariance annotation from OCaml 5.4.
As part of these changes, we will deprecate ppxlib’s copy of Ast_helper in favor of Ast_builder, aiming to remove Ast_helper entirely in 1.0.0. We have been maintaining two distinct modules for quite a while now. Ast_helper also has a tendency to encourage its users to generate all their code with Location.none as their location which makes the life of their users a bit hard when they have to interpret compiler errors.
This can be seen as a middle ground between the approach proposed here 6 years ago (that we gave up on due to its complexity) and the current situation.
Limitations
Encoding new features into extension points is not always easy, only specific parts of the AST can be replaced by an extension point. To keep things under control and prevent ppx-es from generating inconsistent nodes, all new features will always be migrated into an extension point. That means that if the impacted node cannot directly be encoded that way, we will encode the first suitable parent node. In some scenarios, that can climb up the AST types quite significantly, potentially all the way to the structure_item/signature_item. This means that new features won’t be equal when it comes to how easy it is to use them in conjunction with some ppx-es. It’s important to keep in mind that this is still a net improvement as it was previously not possible to use them together at all.
Similarly, providing a nice API to allow building and destructing encoded new features will vastly depend on the features themselves and how entangled they are with new AST types. We will likely not always expose such builder/destructor pairs and might only add some of them if the demand is high enough.
It is also part of the reason why we will probably still bump our AST at some points in time even if much less frequently than we have in the past. When that eventually happens, we will be able to maintain the previous major versions for quite a while as this will just be a matter of adding our newest migrations there as well.
Effect syntax example
OCaml 5.3 introduced the following syntax:
match f () with
| v -> Complete v
| effect (Xchg msg), k ->
...
This special effect pattern is represented in the 5.3 AST with the Ppat_effect variant:
| Ppat_effect of pattern * pattern
We cannot represent this in the 5.2 AST and previously, any attempt at migrating such a node down would have failed. With this new approach we instead migrate it to something along those lines:
[%ppxlib.migration.ppat_effect? (Xchg msg, k)]
and the upward migration knows to translate this to the right Ppat_effect node. This migration needs to work without context outside the extension so that any ppx that would unknowingly copy such a node elsewhere in the AST would not cause an uninterpreted extension error later on during the compilation.
If this is passed down to an existing ppx as part of its payload and it tries to interpret it, it should fail as it won’t know what to do with such an extension.
Note that ppx authors should never rely on the actual extension point encoding, we reserve ourselves the right to change that encoding as part of minor or patch releases of ppxlib. Such nodes should be left untouched or dealt with using the stable API described below.
Now if a ppx author needs to add explicit support for effects they will be able to use something like:
val ppat_effect : loc: location -> pattern -> pattern -> pattern
from Ast_builder to generate such a node. Of course if your ppx generates an effect pattern with an older compiler, this will lead to a compile error as the extension won’t be translated unless migrated back up. Authors will have to be mindful of this and properly document when/how they’ll generate newer nodes and eventually restrict their ppx to the right range of compilers.
These will likely come with a “destruct” version in Ast_pattern. For the effect pattern it should look like:
val ppat_effect : (pattern * pattern, 'a, 'b) t -> (pattern, 'a, 'b) t
Bivariant type parameter example
This example is probably a bit of a stretch as it is a very niche syntax change and is highly unlikely to actually be used in the wild, but it makes a good example of a feature that is hard to encode.
In OCaml 5.4, a new variant was added to the Asttypes.variance type: Bivariant. The variance type is used in the AST to describe how a type parameter behaves relative to the type itself. This can be manually annotated for each parameter when writing a type declaration or a class.
The Bivariant case is a bit of a special one as a parameter can only be Bivariant (i.e. covariant AND contravariant) with the type if it does not actually appear in the concrete type definition, that is in cases such as:
type 'a t = A
For reasons that we won’t expand upon here, 5.4 introduced the following syntax to allow one to explicit annotate a parameter as bivariant:
type +-'a t
The problem is that the variance cannot be replaced directly by an extension point, see the type type_declaration for instance:
and type_declaration =
{
ptype_name: string loc;
ptype_params: (core_type * (variance * injectivity)) list;
^^^^^^^^
(** [('a1,...'an) t] *)
ptype_cstrs: (core_type * core_type * Location.t) list;
(** [... constraint T1=T1' ... constraint Tn=Tn'] *)
ptype_kind: type_kind;
ptype_private: private_flag; (** for [= private ...] *)
ptype_manifest: core_type option; (** represents [= T] *)
ptype_attributes: attributes; (** [... [\@\@id1] [\@\@id2]] *)
ptype_loc: Location.t;
}
In this example we have to encode the entire parent node of the type declaration as an extension point.
This means that it spreads in quite a few places, type_declaration can be found in structure_items, signature_items and inside some module_type nodes as well.
Given there’s very little to no use for this syntax, we won’t be providing any function to build or destruct such nodes initially.