A question about meta-quotation (via PPX)

Chet_Murthy · March 12, 2023, 12:53am

I’m curious about something, and thought I’d ask what people thought. It’s an involved explanation, but there is a real question at the bottom, which is basically “is this too complicated and involved?”

Background

Some background: there’s a PPX rewriter called ppx_metaquot. It takes the text of some OCaml AST item, wrapped in an extension tag, viz.

[%type: bool * char]

or

[%type: [%t? b] * [%t? c]]

and it expands that to the corresponding AST. From the documentation, there’s one big problem with it: for anything that isn’t a spot where an extension can go, you can’t easily do antiquotations. The example they give is: in

[%str module M = struct ... end][@subst let M : string = s]

there, the idea is, you want to use the contents of the variable s for the module-name. And you can’t really specify this directly in OCaml’s AST, so you’re … stuck.

A Different Approach

The problem here is that we’re assuming we cannot modify the OCaml AST. But what if we could? Imagine that we have a parallel OCaml AST type family that differs in the following manner from the official one:

Suppose we have a type

type 'a vala = VaVal of 'a | VaAnt of string

so in a spot like that module-binding, which is

  | Pstr_module of module_binding  (** [module X = ME] *)
....
and module_binding =
    {
     pmb_name: string option loc;
     pmb_expr: module_expr;
     pmb_attributes: attributes;
     pmb_loc: Location.t;
    }

you’d just change the type of the field (notice the vala below)

pmb_name : string vala option loc

Now, you modify the grammar in a similar manner, and invent some syntax for antiquotations that doesn’t conflict with OCaml syntax today, and … you can write (notice the raw-string, b/c this isn’t parseable by the official OCaml parser, but by our modified parser):

[%str {|module $uid:s$ = struct ... end|}]

And then we can emit either OCaml pattern or expression syntax for the official AST.

And this works for literally every spot in the grammar where there is a nonterminal producing a value that is directly installed into some location in the AST. So you can do:

[%typ {| $list:l$ $longid:t$ |}]

Advantages

You can use surface-syntax-based pattern-matching to access (and build) far more of the AST (with antiquotations) than you can do in an approach that doesn’t change the grammar. OCaml’s AST changes a bit from version-to-version, but over a series of versions it changes enough that it makes updating PPX rewriters a chore. If most of the work of accessing the AST were via meta-quotation, then a lot of that updating would be done automatically … by the updated metaquotation PPX rewriter. That’s going to make it much easier to maintain rewriters.

Also, it’s just easier to write PPX rewriters when using meta-quotation, b/c you don’t actually have to know the details of all the AST types – just the surface syntax, and where various antiquotations go. This makes writing PPX rewriters much, much, MUCH, MUCH easier. I mean really a lot easier.

Limitations

What are the limitations of this technique? Well, it seems that things like attributes are stored (for key AST types: expression, pattern, core_type, a few others) in a record, and some of the fields of the record do not correspond to nonterminals in the grammar. So you can’t pull out attributes, without pulling out other things that aren’t constructible.

[concretely, if you want the attributes of an expression, you might want to write [%exp {| $e$ $attrs:l$ |}] but the antiquotation e doesn’t correspond to anything in the grammar. Of couse, that’s fixable, but at the price of changing the grammar in ways that aren’t part of what I’ll describe below.]

Costs

What’s the cost of this approach?

make a copy of most of OCAMLSRC/parsing
scatter these vala annotations around in parsetree.mli, longident.mli and maybe a few other files.
add some injection and projection functions in some of the code of the grammar and helper functions.
and modify the grammar to use these antiquotations.
last, of course, write the code to convert this “pattern parsetree” into patterns and expressions. But that’s entirely outside of these changes, and in any case it can be implemented as a PPX deriver.

The nice thing is, you can erase the above annotations (the “vala”, and projection/injection functions) and recover the original parser. So in principle, these changes are checkable and easily-maintainable going forward.

The Question

OK. So now to my question: Is this a means for achieving better meta-quotation, that would be interesting? Or is the requirement to make (what I would consider anodyne) changes to the parser and support code (in a copy, but nevertheless) a deal-breaker?

gasche · March 12, 2023, 9:09am

Very naive question: why not add support for extension points in the necessary positions in the upstream parser and AST?

Chet_Murthy · March 12, 2023, 9:45am

One supposes that the metaquotation facility being proposed would have to be pretty valuable, to merit that sort of change to the base distribution. Before you could go that far, you’d want good evidence, and such evidence might consist in uptake of this approach to metaquotation.

Full disclosure: [as you might have guessed] I’ve implemented the deriver that processes the OCaml AST type-definitions to produce the function that takes an “AST with pattern-vars” and outputs patterns/expressions over the original OCaml AST … in Camlp5. That would have to be rewritten without Camlp5. So there’s some nontrivial amount of infrastructure that goes with this approach.

gasche · March 12, 2023, 9:56am

I am reasonably convinced that metaquotations are a useful things to have, given that we used them a lot in the old Camlp4 times. In my experience upstream frontend maintainers are happy to make changes of general interest as long as the ppxlib people (who in practice bear the costs of modifying the parsetree) are in the loop and agree with the proposed change. Doing this upstream sounds easier than maintaining a separate copy of the frontend – which is, indeed, what Camlp5 is already doing.

Another argument in favor of the change is that allowing extension points in more places sounds useful in general for ppx authors, not just for metaquotations.

Note: I re-learned today that there exists two maintained implementations of metaquotations for ppx, namely the metaquot package from @thierry-martinez and the ppxlib.metaquot package maintained inside ppxlib. I am not sure which one should be used when. I would guess that most ppx authors just stick with what’s inside ppxlib.

zbaylin · March 12, 2023, 7:43pm

I’ve taken to using @thierry-martinez’s metaquot simply because of this blurb in the README:

metaquot is built by meta-programmation over the Parsetree module (thanks to metapp) and is meant to be trivial to update for future versions of OCaml

I’ve had issues in the past with upgrading ppxs to support new versions of OCaml due to ppxlib.metaquot, but haven’t had the same experience with just metaquot.

Chet_Murthy · March 12, 2023, 8:01pm

Yes! My belief is that lack of really comprehensive metaquotation facilities that make it unnecessary to know the details of the OCaml AST, are the real roadblock to getting everybody writing PPX rewriters.

Topic		Replies	Views
Ppxlib.metaquot and identifiers Learning ppx	5	622	November 13, 2023
The possibility of quasiquote syntax a la MetaOCaml for ppx? Community ppx , metaocaml	1	989	April 18, 2020
Best Practices for Attributes in PPX Learning ppx	10	836	November 16, 2023
Quasi-quotations for the OCaml AST and PPX rewriters Ecosystem ppx	0	439	March 28, 2023
Bootstrapping our way to Hashconsing and quotations with PPX Rewriters Ecosystem ppx	13	1617	October 10, 2020