Update on the big ppx refactoring project

jeremiedimino · September 25, 2019, 4:37pm

Dear all,

We wanted to give an update on the status of the big ppx refactoring project. At this point, we have settled on a technical solution and are working towards implementing it. In this post I’ll describe briefly how it works and what the plan is. This post directly follows The future of ppx which gives an overview of the ppx history and the issues we want to solve.

We also gave a presentation explaining all this at the OCaml Users and Developers workshop at ICFP this year with @NathanReb, so if you’d like to see a live version of this post watch out for the video on youtube!

And before diving in the technicalities, on the social side we welcome @rgrinberg who joined the team!

The solution

It wasn’t easy to came up with this solution, but now that we have it it is actually quite simple, which means that we can be confident it will work.

Abstracting the AST

To solve the stability problem, we are not re-inventing the wheel. We are simply making use of a very well established method in the functional programming world: abstraction.

Instead of exposing the raw data-types of the OCaml AST to authors of ppx rewriters, we are simply going to expose an API where all the types are abstract. Authors of ppx rewriters will need to use construction functions in order to construct the AST and one-level deconstruction functions to deconstruct it.

Moreover, this API will follow closely the layout of the underlying AST so that we can mechanically follow its evolution. More precisely, there will be one module for every version of the AST with an API that matches the AST at this version.

This is all still very fresh and experimental, but here is a sneaky peak of what this will look like: https://github.com/ocaml-ppx/ppx/blob/70c0bfd3b7a3e8a27e5ad890801d7c93f7dc69a7/astlib/src/stable.mli

One important detail that will ensure good interoperability between ppx rewriters is that the types will be equal between versions. i.e. V4_07.Expression.t will be exposed as equal to V4_08.Expression.t.

The deconstruction API will be a bit raw and in particular won’t allow nested patterns. To help with this, we will provide view patterns implemented via a ppx rewriter shipped with the ppx package. And of course, meta-quotations will still be available.

Using dynamic types under the hood

The stable APIs are one thing, but we also want to keep the good composition semantic of ppxlib. Trying to compose things using the static types under these multiple APIs would be a bit of a nightmare. So instead we are going full dynamic. During the single-pass rewriting phase, the AST will be represented using dynamic types. Downgrades/upgrades will happen lazily and only at the edges as requested by individual ppx rewriters using a history of conversion functions provided by the compiler. In essence, these conversion functions will be very similar to the one we have in ocaml-migrate-parsetree, except that since they operate on the dynamic types they will be much smaller and focus on the interesting changes.

And because the conversions will happen only when needed, in many cases we will be able to use new language features even if the various ppx rewriters in use are written against older versions of the AST.

The plan

The plan is to get all of this implemented, proof test it against a bunch of existing older version of the AST and finally eat our own dog food by porting a bunch of ppx rewriters to the new world, making sure that the port is as smooth as possible.

Once this is all done, we will release the 1.0.0 version of the ppx project for public consumption.

It will be possible to use ppx rewriter baseds on ppx in conjunction with ppx rewriters based on ppxlib or ocaml-migrate-parsetree, just so that we don’t need to port everything at once.

Our expectation is that the next time the parsetree changes and authors of ppx rewriters need to update their code, they’ll choose to migrate to ppx at the same time given that it will give them long term stability guarantees.

Drup · September 25, 2019, 5:16pm

I must admit I like this proposed API a lot more than the previous plan. I think this one will lead to much better adoption, especially for people, like me, who tend to write their current PPXs the “old school” way, with only omp.

The only real suggestion I have at this point is to keep the new astlib to the minimum: the API you propose here and the driver (and maybe metaquot/viewppx). I’ll be happy to give more feedback when I can test it.

jeremiedimino · September 26, 2019, 9:01am

What we call astlib won’t be a user facing library. It will live in the compiler itself and be the smallest possible API that ensure that the ppx world keeps working with new compilers and even trunk. It will only contain:

the definition of the dynamic AST
functions to convert between the static and dynamic ASTs
the history of upgrade/downgrade functions

The user facing package will be ppx, which will be composed of

the ppx library, which in particular will expose the versioned AST interfaces, the view patterns, the driver and various modules ported from ppxlib. It will be approximately the same size as ppxlib except that it will have no dependency
ppx.metaquot for metaquotations
ppx.view for view patterns

If there is a need to have just the versioned AST interfaces on their own without the rest, we could certainly imagine distributing them as a separate library of even a separate package.

Regarding the driver, it will be similar to the ppxlib driver. i.e. the standard way to register a transformation will be by registering extension point expanders or derivers rather than whole AST mappers, though the latter will still be allowed for special cases. This is simply because such precise transformers have a better composition semantic and lead to faster rewriting, so we want to encourage ppx authors to use that.

jeremiedimino · September 26, 2019, 9:01am

And feedback will definitely be welcome!

Chris00 · September 27, 2019, 2:17pm

Will the dynamic AST be serializable ? Minimal serializable AST representations are useful to, for example, run toplevels in separate processes and return the outout without making assumptions about how it will be used.

Chris00 · September 27, 2019, 2:21pm

BTW, an attribute [@serializable] testifying that the type can be serialized without throwing an exception would be nice.

jeremiedimino · September 30, 2019, 7:40am

It will be but it will be a bit bigger than the static one.