[ANN] Ppxlib: Support for future compilers

Handling future AST changes in ppxlib

The OCaml 5.2 compiler release has introduced changes in core parts of the AST types. Reflecting those changes when we bumped the internal AST used by ppxlib in 0.36.0 caused breakage in a lot of reverse dependencies. Despite our efforts to keep the ecosystem up to date, it has lead to a split in the opam universe between packages that are compatible with 0.36.0 and above and those that aren’t.

Looking at the 5.3 and 5.4 AST changes, we cannot reasonably keep the same “update the universe” approach going forward I would like to propose a slightly different but much more stable and sustainable approach.

I think it’s important to have a bit of context on why ppxlib is designed the way it is and how we’ve been handling new compiler releases over the past few years to understand this new approach and how it’s going to improve the situation.

The next section of this post will summarize this. If you’re already familiar with ppxlib’s history and design choices, please skip ahead to the Proposed Approach section.

Ppxlib and compiler releases: How it works today

Ppxlib internal AST

Before ppxlib there was ocaml-migrate-parsetree. OMP had the advantage of providing a stable API for ppx authors. Each ppx would select a fixed version of the AST and be implemented as a full AST rewrite, i.e. a structure -> structure or signature -> signature function.

This had the advantage of making ppx-es forward compatible as omp maintainers would add support for new compiler releases in the form of a new module containing the AST types for this version and migration functions to convert to/from the types matching the previous compiler version.

OMP also came with a ppx driver, i.e. a program in charge of applying a set of ppx-es on a given AST or source file and spit out the final preprocessed AST for the compiler. The driver was responsible for migrating the AST from the compiler’s version to the one used by a ppx. Because each ppx could require a different AST version, it also potentially had to migrate the AST transformed by a ppx before it passing it on to the next one.

This had a few disadvantages though:

  1. poor performance as the AST was traversed and migrated (i.e. copied) several times through the course of a single driver run.
  2. transformations semantic issues: the order in which ppx-es were applied was uncertain or rather tied to the set of ppx-es version used. That meant that updating one ppx could change its “turn” and result in a different AST returned by the driver. This also did not allow ppx-es to interact together reliably.

ppxlib aimed at fixing those issues by forcing ppx-es to agree on the AST version to use. ppxlib provides its own, fixed AST version that ppx-es have to use. Its driver handles the migration to/from the current compiler and provides a smooth API to write transformations as rewriting rules. The driver then handles the AST rewrite by recursively applying those rules in the right places in a single AST traversal.

Support for new compilers

Support for new compilers comes in two stages.

Build and preprocess old code with new compiler

This is the most basic support, that is making sure that one can still build and preprocess its code using the newest compiler, provided they don’t use any of the new language features.

To do this, we add the new AST types and migration functions, just as OMP used to do. This does not allow new features in the code because those cannot be represented with the old AST types and the migration would fail (This was also an existing limitation of OMP).

This is usually released early on, when the compiler is still in beta and is a non breaking change, all reverse dependencies still build with this new version.

Support new language features

To support new language features, we bump the AST used by ppxlib. This means new features don’t have to be migrated anymore and are therefore supported.

This does change types that are exposed as part of ppxlib’s API and can cause breakage in reverse dependencies, depending on which part of the AST were modified and which part each individual ppx uses explicitly.

We provide tools that can help make ppx code more robust as they allow matching over and producing AST nodes without explicitly referencing the types themselves: metaquot, Ast_builder or Ast_pattern for instance. That’s not always enough though and eventually, those ppx-es have to be updated to be compatible with the latest version of ppxlib.

As was the case for the 5.2 AST bump, when we release such a ppxlib version, we send PRs to help maintainers of our opam reverse dependencies update and carefully add upper bounds to the versions that aren’t compatible anymore.

This worked pretty well for a few years as the AST was relatively stable and the parts that were modified were not directly used by a lot of ppx-es.

A problem with this approach is that even though we can help maintainers go through the update, we cannot release packages in their stead which means that unmaintained ppx-es aren’t compatible anymore no matter how much effort we put into easing the upgrade. It is also often the case that not all ppx-es have a compatible release straight away and this results in a transition period with the opam universe split mentioned in the introduction.

Proposed approach for 5.3 onward

The first part of this plan is to freeze ppxlib’s internal AST for each major versions. That means that until we release ppxlib.1.0.0 our internal AST will always be the 5.2 AST.

The second and most important part is to provide complete forward compatibility despite the AST freeze. We will allow migrating new features down to our AST by encoding them inside specific language extensions and migrating them back to their original form before returning the preprocessed AST to the compiler.

This will allow existing ppx-es to be used with new compilers AND to be used in the same files as new language features as long as they don’t have to directly interact with them without being updated in any way.

We will also provide a stable API to allow ppx-es that would like to add special support for these new features to build and match over such nodes.

You can take a look at the examples below to get an idea of what that would look like for recent language features such as the effect syntax from OCaml 5.3 or the bivariance annotation from OCaml 5.4.

As part of these changes, we will deprecate ppxlib’s copy of Ast_helper in favor of Ast_builder, aiming to remove Ast_helper entirely in 1.0.0. We have been maintaining two distinct modules for quite a while now. Ast_helper also has a tendency to encourage its users to generate all their code with Location.none as their location which makes the life of their users a bit hard when they have to interpret compiler errors.

This can be seen as a middle ground between the approach proposed here 6 years ago (that we gave up on due to its complexity) and the current situation.

Limitations

Encoding new features into extension points is not always easy, only specific parts of the AST can be replaced by an extension point. To keep things under control and prevent ppx-es from generating inconsistent nodes, all new features will always be migrated into an extension point. That means that if the impacted node cannot directly be encoded that way, we will encode the first suitable parent node. In some scenarios, that can climb up the AST types quite significantly, potentially all the way to the structure_item/signature_item. This means that new features won’t be equal when it comes to how easy it is to use them in conjunction with some ppx-es. It’s important to keep in mind that this is still a net improvement as it was previously not possible to use them together at all.

Similarly, providing a nice API to allow building and destructing encoded new features will vastly depend on the features themselves and how entangled they are with new AST types. We will likely not always expose such builder/destructor pairs and might only add some of them if the demand is high enough.

It is also part of the reason why we will probably still bump our AST at some points in time even if much less frequently than we have in the past. When that eventually happens, we will be able to maintain the previous major versions for quite a while as this will just be a matter of adding our newest migrations there as well.

Effect syntax example

OCaml 5.3 introduced the following syntax:

match f () with
| v -> Complete v
| effect (Xchg msg), k ->
  ...

This special effect pattern is represented in the 5.3 AST with the Ppat_effect variant:

  | Ppat_effect of pattern * pattern

We cannot represent this in the 5.2 AST and previously, any attempt at migrating such a node down would have failed. With this new approach we instead migrate it to something along those lines:

[%ppxlib.migration.ppat_effect? (Xchg msg, k)]

and the upward migration knows to translate this to the right Ppat_effect node. This migration needs to work without context outside the extension so that any ppx that would unknowingly copy such a node elsewhere in the AST would not cause an uninterpreted extension error later on during the compilation.

If this is passed down to an existing ppx as part of its payload and it tries to interpret it, it should fail as it won’t know what to do with such an extension.

Note that ppx authors should never rely on the actual extension point encoding, we reserve ourselves the right to change that encoding as part of minor or patch releases of ppxlib. Such nodes should be left untouched or dealt with using the stable API described below.

Now if a ppx author needs to add explicit support for effects they will be able to use something like:

val ppat_effect : loc: location -> pattern -> pattern -> pattern

from Ast_builder to generate such a node. Of course if your ppx generates an effect pattern with an older compiler, this will lead to a compile error as the extension won’t be translated unless migrated back up. Authors will have to be mindful of this and properly document when/how they’ll generate newer nodes and eventually restrict their ppx to the right range of compilers.

These will likely come with a “destruct” version in Ast_pattern. For the effect pattern it should look like:

val ppat_effect : (pattern * pattern, 'a, 'b) t -> (pattern, 'a, 'b) t

Bivariant type parameter example

This example is probably a bit of a stretch as it is a very niche syntax change and is highly unlikely to actually be used in the wild, but it makes a good example of a feature that is hard to encode.

In OCaml 5.4, a new variant was added to the Asttypes.variance type: Bivariant. The variance type is used in the AST to describe how a type parameter behaves relative to the type itself. This can be manually annotated for each parameter when writing a type declaration or a class.

The Bivariant case is a bit of a special one as a parameter can only be Bivariant (i.e. covariant AND contravariant) with the type if it does not actually appear in the concrete type definition, that is in cases such as:

type 'a t = A

For reasons that we won’t expand upon here, 5.4 introduced the following syntax to allow one to explicit annotate a parameter as bivariant:

type +-'a t

The problem is that the variance cannot be replaced directly by an extension point, see the type type_declaration for instance:

  and type_declaration =
      {
       ptype_name: string loc;
       ptype_params: (core_type * (variance * injectivity)) list;
                                   ^^^^^^^^
        (** [('a1,...'an) t] *)
       ptype_cstrs: (core_type * core_type * Location.t) list;
        (** [... constraint T1=T1'  ... constraint Tn=Tn'] *)
       ptype_kind: type_kind;
       ptype_private: private_flag;  (** for [= private ...] *)
       ptype_manifest: core_type option;  (** represents [= T] *)
       ptype_attributes: attributes;  (** [... [\@\@id1] [\@\@id2]] *)
       ptype_loc: Location.t;
      }

In this example we have to encode the entire parent node of the type declaration as an extension point.

This means that it spreads in quite a few places, type_declaration can be found in structure_items, signature_items and inside some module_type nodes as well.

Given there’s very little to no use for this syntax, we won’t be providing any function to build or destruct such nodes initially.

12 Likes

Thanks a lot for this amazing write-up, @NathanReb!

I haven’t been involved anymore in the PPX world for a while, but I still remember the pain, and I think exploring different approaches to what we used to do is the way forward! As an anecdote: I still remember the AST bump of ppxlib to OCaml 4.14. Back then, we decided to add a stable layer to Ast_builder (and Ast_helper). Not a perfect solution either, but without it, we would have broken over 40 PPXs in that one single AST bump (the Ppat_construct node changed back then)! I imagine that the bump to the OCaml 5.2 AST was even worse.

We will allow migrating new features down to our AST by encoding them inside specific language extensions and migrating them back to their original form before returning the preprocessed AST to the compiler.

Already back when I worked on ppxlib, we used to use this approach in some concrete cases, and I think applying it on a large scale will almost certainly lead to occasional bugs. But I agree that despite the potential bugs, the payoff will be positive.

3 Likes

And btw, I’ve also meant to say for a while: Thanks for all the impressive work that both you and @patricoferris are doing for the PPX ecosystem! I think that kind of work is tedious and often unseen :heart:

5 Likes

Great to see some improvement efforts in this area! I think this is a very hard problem to solve.

I’m a bit curious about the “there is no extension node at this place” case.
Let’s take the example of the new bivariant variance (and let’s suppose that the feature is useful!).

If someone writes:

type t1 = string
and +-'a t = A [@@deriving yojson]

one could expect that to work, at least for t1, as the yojson deriver does not care about variance. However, I guess it’s going to be migrated as something that looks like

[%%ppxlib.type_with_bivariant_variance
  type t1 = string
  and 'a t = A [@@deriving yojson]]
  • Will the yojson deriver be run inside this extension node?
  • If yes, and suppose it needs to know whether it is bivariant or not. How can it call an Ast_pattern function to know this, given that it is inside the extension node?

What would be the downside of using attributes (as is already the case in some places of ppxlib migrations)? Attributes can be placed in more parts of the code, so one could rewrite the above as:

type t1 = string
and 'a t = A [@ppxlib.variance Bivariant] [@@deriving yojson]

and the yojson deriver would work well out of the box, and be able to know the actual variance if needed…

1 Like

Thanks @NathanReb for all of your work on this problem. The proposal sounds very cool!

I have one question about the new extension nodes, building on your effect syntax example:

Would it be possible to keep the extension payload abstract outside of Ppxlib?

It seems totally reasonable for the library contract to say that “ppx authors should never rely on the actual extension point encoding”. But Hyrum’s Law tells us that users will inevitably take a dependency on visible implementation details, either by accident or by choice. I think we already see some evidence of this in the current ecosystem, where PPX authors may choose to depend on Parsetree constructors directly, even when more future-proof abstractions exist in Ast_builder and Ast_pattern.

2 Likes

@panglesd this is something we’ve been thinking about for quite a while and we also benefited from the experience of some folks at Jane Street that have tried a similar approach in Oxcaml.

We deliberately chose to only use extension points because attributes are all too often ignored and potentially even dropped by ppx-es. That could lead to particularly unsettling bugs for users that can be completely unaware of this whole mechanism.

The advantage of wrapping the encoded nodes in an extension point is that it is highly unlikely a pre-existing ppx will try to interpret it.

In the example you describe, the [@@deriving yojson] would simply not work. Extension nodes payload are not traversed when applying context-free rules so no transformation would happen here.
This is part of the contract of this approach: existing ppx-es cannot hope to natively support new features.
In some cases, they might be able to, if the encoded node sits in a part of the tree they do not try to interpret, or that they simply copy over as it is. For example, one can expect the use of new language features inside a [@to_yojson ...] attributes payload to work (see ppx_deriving_yojson’s documentation for context).

The interaction between old ppx-es and new features is not our main focus here and won’t drive how we encode newer nodes. It might work by accident, but whenever you’re passing down a new language feature to an old ppx, you should expect it to fail.

We prioritize the stability of existing ppx-es and code over support for new features. This will help prevent unwanted tempering with their encodings and associated catastrophic and hard to debug failures for end users.

Consistent support for new features will still require ppx-es to upgrade, the difference is that they will keep working the way they used to until that happens.

If ppx_yojson needed to support bivariant type parameters annotation that would require that:

  1. ppxlib migrations detect supported attributes such as @@deriving attached to the type declaration to take it out of the encoding, essentially resulting in something like:
[%%ppxlib.migration.type_decl_with_bivariant_parameter
  type t1 = ...
  and t = ...]
[@@deriving yojson]
  1. that ppxlib provided a way to declare a deriving rule for Pstr_extension ... and Psig_extension ...
  2. that ppx_deriving_yojson released a new version declaring such a new rule and using the rest of our API to properly interpret the type declaration.

In the case of bivariant annotations, we are not going to go to such lengths but if there were legitimate usecases with a similar upcoming language feature, we’d have to provide those.
This showcases well the limitation of this approach and the reason why we will not always provide the compatibility layer without having a valid usecase for it.

I think those limitation are acceptable as in practice, ppx-es rarely upgrade to support new features anyway and the majority of them just suffer from AST bumps, not because they need to handle widely used new features but rather because they must adjust to the same source code being represented in a slightly different way.

I do sincerely hope that such hard to encode new features won’t be to frequent but even if they do, the advantage is that the hard work will be handled by ppxlib maintainers but should be transparent to the rest of the ecosystem.

1 Like

Thanks a lot @NathanReb for the very clear and detailed reply! I understand the reasoning that stability is more important than supporting new features, you’re probably right. Time and future Parsetree change will tell :slight_smile:

1 Like

@conroj I don’t think we can reasonably make it abstract without encoding the payload as a binary string that only us can decode. Even then, dune makes it possible for users to eventually call "private API’ functions so we likely could not event prevent it this way.

We do not intend on changing the encoding often but as @pitag pointed out, this approach is bound to eventually cause unexpected bugs. We cannot guarantee that the encoding will remain stable as we might have to adapt it to fix unforeseen issues with the initial one.

The best we can do is discourage ppx authors to rely on the encoding directly and instead use our stable API which will handle the new encoding.

I’m well aware this is not perfect but I hope ppx authors will be reasonable and understand that not complying with those directives can lead to unwanted breakage as new releases come out, impacting them, their users and opam-repository maintainers.

1 Like

Thanks @NathanReb, that’s fair.

You’ve probably thought about this already, but the extension’s namespace might be a good place to provide cues about the lack of API stability:

[%ppxlib_internal_only.migration.ppat_effect? (Xchg msg, k)]

1 Like