Enforcing order of ppx rewrites

When building with dune, is there a way to enforce the order of ppx rewrites?

For example, if I have a ppx that contains a type definition, and there’s a deriving clause attached to the type definition, I want to be sure the containing ppx is applied, and next the deriver is applied.

3 Likes

staged_pps may be what you need.

I don’t think I need staged_pps, really.

I’m hoping that ppxs are just applied in left-to-right order, maybe that’s enough.

I faced the same issue and no solution yet.

I created a demo project. https://github.com/Kakadu/dune-demo-two-ppxes I will be glad to see some advice about it.

1 Like

Dune simply link the various ppx rewriters in the order they are specified in the dune file. It’s up to the underlying driver mechanism to decide whether to take this order into account or not.

They are currently two drivers out there:

  • the ocaml-migrate-parsetree one
  • the ppxlib one

The ppxlib one is built on top of the ocaml-migrate-parsetree one, and in particular all transformations registered with ppxlib appear as a single transformation registered with ocaml-migrate-parsetree.

ocaml-migrate-parsetree only accepts whole file transofmations, i.e. ast -> ast functions. These are ordered by the version of the OCaml AST they use in order to minimise the number of Ast upgrade/downgrade. It is however possible to attach a priority to a particular transformation to force its position in the pipeline.

ppxlib accepts two kinds of transformations, whole file ones just like ocaml-migrate-parsetree and more high-level and well defined rule that can be merged together. A typical rule is to rewrite a particular extension point. All rules are merged together and applied in a single pass, which ensures both good performances and good semantic. Indeed, the output of individual rules is rewritten recursively, which ensures that if the expansion of a particular extension point produces more extension points or even [@@deriving] attributes, these are properly expanded no matter the order in which the ppx rewriters were specified in the dune file.

In general, trying to reason about the order of whole file transformations is tedious. For authors of ppx rewriters, you don’t know what other ppx rewriters the user is going to use in combination with your own ppx rewriter and how they will all interact with each others. For users of ppx rewriters, they usually don’t know the low-level implementation details of each ppx rewriters and how they should be ordered. That’s why in ppxlib we made the design choice that the overall rewriting is always completely independent of the order in which the ppx rewriters are specified.

In conclusion, in today’s world you cannot reliably enforce a particular order. I would suggest to describe your use case more in detail so that we can see how it can fit in the current world or how we can extend the world to make it work.

1 Like

There are three ppxs we’d like to use:

  • ppx1 : traverses the AST, and makes sure that invocations of ppx2 do not occur in particular syntactic positions, and that “deriving bin_io” and another deriving target do not occur anywhere in the original AST; fail if these properties do not hold
  • ppx2 : rewrites some types to include “deriving bin_io” and the other deriving target
  • ppx3 : rewrites based on the other deriving target

So if ppx2 were applied before ppx1, for example, compilation will fail unnecessarily.

What do ppx1, ppx2 and ppx3 do?

I’m not sure what additional detail you’re seeking, beyond what I wrote above.

ppx1 does not change the AST in any way, it’s just there to enforce syntactic restrictions.

ppx2, besides adding “deriving bin_io” and another deriving target, also generates some functor invocation boilerplate.

ppx3 processes the other deriving target, generating some simple value definitions within a module.

Based on your earlier comment, we should be able to run ppx1 as a separate check, not as part of compile-time processing. At compile time, ppx2 and ppx3 should then work without regard to order.

My current workflow is about applying first rewriter and then second.

  • first rewriter searches for type declarations annotated as [@@first] and replaces them by two or three other type declarations. Original declaration is removed from the code in general case. Generated type declarations are annotated by [@@deriving...] (usually by copying all other attributes except [@@first] from original type definition).
  • [@@deriving second] generates the rest of the code.

Example. Before

type t = ... [@@first] [@@deriving blah]

after applying first

type t1 = FullyAbstractType of t [@@deriving blah]
type t = ... (* reconstruct original t using t1*)  [@@deriving blah]
type t3 = ...   [@@deriving blah]

and then [@deriving blah] is expanded.

Also, I want to mention that the original type declaration should be removed by syntax extension, and hence I can’t use Driver.register_transformation ~rule because it saves old definitions.

For ppx1, you can use the ~lint_impl argument of Ppxlib.Driver.register_transformation. It’s specially designed for this purpose.

Thanks for the details. We have a very similar ppx rewriter inside Jane Street, except that instead of using a [@@first] attribute we put the whole type declaration inside an extension point:

[%%foo type t = ... [@@deriving ...] ]
(* or: *)
type%foo t = ... [@@deriving ...]

Using an extension point fits well in the ppxlib model, so I’d suggest to do that instead.

I didn’t thought about this approach. thanks!