Dear all,
If you are enjoying the many benefits of ppx, you probably know that it comes at a very high price: every OCaml release can potentially break your code. If you are the author of a ppx rewriter, it is likely that you often need to upgrade your code and re-release it.
This has happened several times since the ppx system has landed, and we recently decided that it was time to solve this problem once and for all. In this post, I will discuss the current state of affairs, why we want to change it and how we plan to do it.
This initiative is pushed by Jane Street as we are a big consumer of ppx rewriters. It will be executed by the ppxlib team, currently formed of myself, @xclerc and @NathanReb, as well as Carl Eastlund who is joining us for this particular effort.
The current state of the ppx world
The current ppx world is composed of various components. I am quickly describing the ones I know in this section. The order doesn’t correspond to the order in which these components where developed.
The -ppx option of the compiler
ocamlc
and ocamlopt
both take a -ppx
command line option. This option takes as argument a program that is executed during the compilation of a file in order to transform it on the fly. This program is called a ppx rewriter or ppx for short. More precisely, once the OCaml compiler has parsed the source file and constructed an in-memory representation of its structure, often called an Abstract Syntax Tree or AST for short, it runs the ppx with this AST as input. The ppx returns a new transformed AST and the compiler continues the compilation process with this new AST, discarding the original one.
Several -ppx
options can be passed to the compiler. In this case, the compiler will apply the various ppx rewriters one by one, each one feeding its output to the next one.
When this option was introduced, the language was also augmented with extension points and attributes, giving ppx rewriters hooks to embed foreign DSLs in OCaml source files for their own purposes.
Ppx rewriters are typically OCaml programs that use the internal compiler libraries to analyse and transform the AST. Most often, they simply expand a few specific extension points and/or interpret a few attributes.
The -ppx
option, a few modules of the compiler libraries, extension points and attributes form the basis of the ppx system. These were developed and integrated in the OCaml compiler mostly by @alainfrisch several years ago. The original motivation for this work was to provide a technically simpler replacement of Camlp4 as well as enforce a more uniform syntax of the language. Camlp4 was the previous official meta-programming system for OCaml.
ppx_tools
ppx_tools is the original toolbox for authors of ppx rewriters. It is composed of a library of helpers and a couple of tools. It was originally developed by @alainfrisch.
ocaml-migrate-parsetree
The compiler libraries are unstable and often change in incompatible ways. This includes the definition of the AST. ocaml-migrate-parsetree is a library that exposes the AST definition of each major version of the compiler as a separate module, as well as migration functions to convert between the various versions. A ppx rewriter can then choose one single version of AST to work with and ocaml-migrate-parsetree will do the necessary conversions to allow the ppx rewriter to be used with a different version of the compiler.
In addition, it also provides a small driver functionality, which allows to link several ppx rewriters into a single executable in order to use a single -ppx
option of the compiler rather than several ones. This allows ocaml-migrate-parsetree to perform the minimum number of AST conversions in order to speed up the overall process.
Finally, ocaml-migrate-parsetree snapshots not only the AST but also the few modules from the compiler libraries that form the basis of the ppx system. This was done to ease the port of existing ppx rewriters to ocaml-migrate-parsetree, however we have now come to regret this choice has it makes it difficult to support new versions of the compiler.
ocaml-migrate-parsetree was initially developed by @let-def, and I myself joined the project in its early days as I was eager to use it for the Jane Street suite of ppx rewriters.
ppx_tools_versioned
ppx_tools_versioned extends ocaml-migrate-parsetree to ppx_tools. More precisely, ppx_tools_versioned is a package that contains one full copy of ppx_tools for each version of the AST snapshoted by ocaml-migrate-parsetree. This allowed ppx rewriters using ppx_tools to be easily ported to ocaml-migrate-parsetree.
ppx_tools_versioned was created by @let-def.
ppx_deriving
ppx_deriving is a ppx rewriter that allows to automatically derive code from type definitions. A list of derivers can be attached to a type definition via a [@@deriving]
attribute. ppx_deriving provides a few derivers and third party projects can implement their own derivers. Each deriver must register itself against the ppx_deriving library. For this reason, the various derivers must be linked inside the same executable. To this purpose, ppx_deriving offers a driver functionality. This driver supports both static and dynamic linking of the various plugins.
ppx_deriving predates ocaml-migrate-parsetree, however nowadays the driver part of ppx_deriving is using the ocaml-migrate-parsetree driver as backend so that ppx_deriving, deriving plugins and other ppx rewriters can be linked as part of the same ppx driver. Apart from that, ppx_deriving is still based on the current version of the OCaml AST, meaning that every new OCaml releases can potentially break it.
ppx_deriving was developed by @whitequark in the early days of ppx. Nowadays @gasche is the main maintainer of ppx_deriving.
ppxlib
ppxlib is a comprehensive library that exposes a higher level abstraction for authors of ppx rewriters. More precisely, ppx rewriters are no longer seen as blackboxes that transform the full AST and must be applied one by one. Instead, extension points are seen as compile time functions that are evaluated in a top-down manner. Not only this leads to much better performances as the whole rewriting is always done in a single pass, but it also provides a much better model for authors and users of ppx rewriters. In particular, it is much easier to reason about how several ppx rewriters compose with each other. Without ppxlib, it is up to the final user to understand the low-level details of the various ppx rewriters in order to understand whether they can be used simultaneously and how.
Ppxlib also provides safety guarantees by checking that all attributes are interpreted, ensuring that typying mistakes are caught instead of being silently ignored. This was in fact the original motivation for the development of ppx_core, the ancestor of ppxlib.
Ppxlib exposes a Ppxlib.Deriving
module providing the same functionality as ppx_deriving. A small common dependency called ppx_derivers ensures that deriving plugins based on either ppxlib or ppx_deriving can be used simultaneously.
It also offers a driver functionality which is built on top of the ocaml-migrate-parsetree one to ensure maximum inter-operability. The library itself is based on one selected version of the AST exposed by ocaml-migrate-parsetree. This way, when a new OCaml compiler is released ppxlib and ppx rewriters based on ppxlib usually continue to work as before. However, when ppxlib bumps the version of the AST it is based on, all clients of ppxlib can potentially break and need to be upgraded and re-released.
ppxlib is the result of a merge between several older ppx projects. These projects were developed at Jane Street and started during the port of our code base from Camlp4 to ppx that was performed by myself and Nick Chapman. I am the original authors of a lot of the architecture and code of ppxlib, although some of the code of Ppxlib.Deriving
is much older than this and dates back from the Camlp4 days.
dune
dune is not strictly part of the ppx world. However, it orchestrates their compositions by linking static ppx drivers on demand. Dune doesn’t support arbitraty ppx rewriters, only the ones that can be linked together as part of the same driver. Additionally, when doing so all the ppx rewriters must be based on the same driver backend.
Nowadays, the vast majority of ppx rewriters are based on the ocaml-migrate-parsetree driver in one way or another.
Why is ppx so painful?
The main reason why ppx rewriters are so much pain is because the system is based on the compiler libraries. The compiler libraries are meant for experts and provide no stability guarantee. With such an unstable basis, it is no surprise that the whole system keeps breaking all the time.
ocaml-migrate-parsetree helped the situation by allowing to sandbox individual ppx rewriters into a protective layer. However, this sandboxing means that this method is not applicable when ppx rewriters need to inter-operate with each other in more sophisticated ways such as with ppx_deriving or ppxlib. Moreover, a user of ppx rewriters cannot use new language features until all the ppx rewriters it uses are based on the new version of the AST. Which means that all ppx rewriters still need to be updated and re-released after a new release of OCaml.
Finally, there are just too many projects doing the same thing which makes everything really confusing.
What’s the plan?
The plan is to provide a stable base for the whole system that doesn’t break when a new compiler is released or require all ppx rewriters to be re-released. Because there are some complicated problems that cannot be solved without breaking API change, this new base will be released as new package that will be called simply “ppx”. Although a large part of it will be imported from ppxlib.
We will ensure that one way or another, ppx rewriters based on ppx can be used in conjunction with ppx rewriters based on ppxlib, ocaml-migrate-parsetree, ppx_deriving, ppx_tools, … This will provide a smooth transition story from the old to the new world. I discussed with the authors of the various projects to make sure they are happy with the idea of this new project eventually replacing everything else, to make sure we are not creating “yet another standard”. I also discussed with the other OCaml core developers to establish a strong link between “ppx” and the compier and make sure that the compiler will never break “ppx”. In particular, it will become much easier to test the trunk of OCaml against all released opam packages.
I am hoping that the stability guarantee provided by this new base will be enough of an incentive for authors of ppx rewriters to switch to it. However, if you have any concern about this plan, please raise them here or to me privately as soon as possible.
What does it mean for the AST?
The main difficulty of this project is to design a stable representation of the OCaml AST. What I mean by a stable AST is the following: a given file will always parse to exactly the same value no matter the version of the compiler. If this property is true, then one can have good confidence that an AST transformation written now will continue to be valid for a long time.
This is currently not true as the types used to represent the AST keep changing in breaking ways. For this reason, ppx rewriters will no longer see the AST used inside the compiler. Instead, they will work on a different AST that is more loose and allows to represent more than just the current language. In particular, this AST should be able to represent any future version of the language. However, the use of private types and construction functions will ensure that ppx rewriters can only construct valid AST fragments.
We are not far enough into this project to know the final representation of the AST. However, to illustrate the idea, here are two examples of what such an AST could look like (I omitted the locations to keep the examples simple):
(* Example 1: plain s-expressions *)
type t =
| Atom of string
| List of t list
type structure = private t
type expression = private t
...
(* Representation of [let x = 1 in x + 1]:
{[
List [Atom "let";
List [Atom "binding";
List [Atom "ident"; Atom "x"];
List [Atom "int"; Atom "1"]];
List [Atom "apply";
Atom "+";
List [List [Atom "ident"; Atom "x"];
List [Atom "int"; Atom "1"]]]]
]}
*)
(* Example 2: adding slightly more structure *)
type t =
| Int of int
| String of string
| Ident of string
| Let of t list * t
| Binding of { lhs : t; rhs : t }
| Apply of t * t list
| ...
type structure = private t
type expression = private t
...
(* Representation of [let x = 1 in x + 1]:
{[
Let ([Binding { lhs = Ident "x"; rhs = Int "1" }],
Apply (Ident "+", [Ident "x"; Int "1"]))
]}
*)
It is easy to see that such ASTs can easily be extended without breaking backward compatibility. Constructions functions would ensure that all values of type structure
or expression
that can be produced are valid AST fragments, i.e. ones that are part of the OCaml language.
For pattern matching, we will provide view patterns based on the ideas used in ppx_view so that programmers don’t accidently write non-sensical patterns, i.e. patterns that can never match anything because they match on values that cannot be produced by the parser. Another way is by testing coverage: if one can reach 100% coverage then this is a proof that all the patterns are valid.
Using an AST that is not the one of the compiler might seem contrary to the philosophy of ppx. However, @alainfrisch mentioned to me that he did envision that ppx rewriters would use a different more stable AST when designing the original ppx feature. So I would say that we are making ppx what it was meant to be rather than diverging from it.
Conclusion
Ppx has a long and storied history leading to a complex stack that is difficult to maintain. We now want to clean it all and restart fresh with a strong base. While doing so, we are opening the discussion about this work early, and most importantly before the point of no return. So I definitely encourage anyone who is interested by all this or will be affected by these changes to chime in, ask for precisions, challenge the technical decisions and raise any concern so that together we can build a better, stronger and more unified ppx ecosystem!
Additionally, all this work will be done entirely in a pure open source fashion, which will make it easy for everyone to follow and/or contribute. In particular, help is most definitely welcome Ppx rewriters are used by a lot of people, so this work will benefit a large part of the OCaml and even Reason communities. So if you are new to OCaml and and motivated to make an impact, then this is definitely a project to consider.