The future of ppx

Hi,

I understand that designing a clean AST is already a challenging task,
but I’d neverthess like to ask for even more!

I am very interested in developing source-to-source typed-directed
program transformations. Whereas a ppx transformer takes an untyped
AST and produces an untyped AST, I’d like to be able to write a
transformer that takes a typed AST and produced an untyped AST
(because producing types on-the-fly is a lot of unecessary work).

I believe that it would be worth thinking about whether it is
possible to define a single “optionally typed AST”, which could
be used both for the purpose of transforming untyped code and
typed code. There are at least two important benefits to having
a single AST: first, it avoid significant effort duplication;
second, it enables a type-directed transformer to return its argument
whenever no changes need to be applied.

If you think that it is viable to anticipate for support of optional
type information in your “will-remain-fixed-for-ever AST”, I would
be very interested in helping out before it’s too late…

A few remarks:

(1) In OCaml, the parsetree and the typetree are already quite close,
(see my post: What would it take to unify the parsetree with the typedtree?)
so I see no reason why it would be an issue to adding extra type
information to a parse tree.

(2) A number of existing ppx-transformer could take advantage of
extra information brought by the type system, e.g. disambiguation
of “Some (x,y)” as 1-argument constructor rather than a 2-argument
constructor.

(3) In terms of AST representation, I imagine something along the
lines of:

type exp = { exp_loc; exp_desc; exp_attribute;
exp_env: env option; exp_type : typ option }

or, equivalently,

type exp = { exp_loc; exp_desc; exp_attribute; exp_typing }
type exp_typing = Typ_unknown | Typ_known of { typ; env }

(4) Ideally, the toolchain would provide a way to refine an untyped
AST into a typed AST, so that one may iterate several typed-directed
transformation.

(5) It would be useful for the typed AST to include type variable
binding locations.
(see my post: Keeping track of where type variables are bound in the typed AST)

(6) As use case for type-directed program transformations, I have a
collection of high-level data structure transformations that I have
starting working on, but got stuck due to the need for types and fully
resolved names. For example, one transformation consists of eliminating
a record field when it is not used or can be deduced from the value of
the other fields from the record. Another transformation consists of
inlining a record into another one, even though the former has been
implemented in a separate library.

(7) Keep in mind that your future AST, whether or not it carries type,
will be useful not only for source-to-source transformations, but also
for source code analyses or translations to other languages. There are
many potential applications. One that I care for is translating typed
OCaml code into “characteristic formulae”, which enables formal proofs
of imperative OCaml code using Separation Logic in Coq. At the moment,
my tool is implemented as a fork of the OCaml compiler, but this is
obviously very unsatisfactory.

(8) I am personally not entirely convinced that it will work out to freeze the
AST forever, so even if it changes very rarely, it might still be a good idea
to include some kind of version number to enable coexistence of several
versions of the AST if it turns out to be absolutely necessary one day.

Thanks
Arthur

2 Likes