Ppxlib.metaquot and identifiers

Hello! I’m trying to learn current best practices for writing PPX transformers and am curious about the capabilities of ppxlib.metaquot. I’m in a situation in which I want to generate a function for a type declaration. To simplify the example, suppose I want to transform

[%foo type 'a bar = 'a]

into

type 'a bar = 'a
let foo_bar (type a) (x : a bar) = failwith "foo!"

(I’m aware that the above would ordinarily be done with a deriver.) In my PPX, I’d love to be able to write something quite simple:

let create_function_from_type (tdecl : type_declaration) : structure_item =
  ... (* lots of looking at tdecl here *)
  let tdecl_name = ... in
  let tdecl_type = ??? in
  [%stri
    let [%p (Pat.var ~loc ({loc;txt=tdecl_name}))] (type a) (x : a [%t tdecl_type]) =
      failwith "foo!"
  ]

I have three questions about the above:

  1. Am I going about this the right way with the [%p ...] extension above? That is: is there any other way to insert an identifier into escaped code other than to build a wrapper around it?
  2. Is there any way to write a value for tdecl_type above that will allow the kind of parameterization I want? I’m pessimistic given the shape of the OCaml AST, but I’d feel better about that conclusion if other people could verify it.
  3. What is wrong with the syntax of the [%stri ...] extension above? Even if I discard the tdecl_type part and go with
    [%stri
      let [%p (Pat.var ~loc ({loc;txt=tdecl_name}))] (type a) (x : a) =
        failwith "foo!"
    ]   
    
    I get a syntax error at the beginning of (type a). (There is no message; ocamlc just says “syntax error”.) Do I have to instead go with
    [%stri
      let [%p (Pat.var ~loc ({loc;txt=tdecl_name}))] = fun (type a) (x : a) ->
        failwith "foo!"
    ]
    
    Is this just a limitation of the extension syntax?

My understanding is that ppxlib.metaquot should help me avoid writing ASTs by hand. I’m finding in my current project that I am constructing a lot of ASTs using Ast_helper and destructing a lot of ASTs using match. Based on my understanding, I conclude that I’m using these tools incorrectly. Any advice on style or approach is appreciated! :slight_smile:

  1. You are doing all right
  2. I don’t understand the question
  3. Your snippet works for me. It seems that issue is somewhere else…

[oof, let’s try that again]

  1. First, for your syntax error. If you look here https://github.com/ocaml/ocaml/blob/9b059b1e7a66e9d2f04d892a4de34c418cd96f69/parsing/parser.mly#L2642
    you’ll find that the path in the grammar you’re going down for the text
let f a b c = argle  bargle goo

[where you want “f” to be an extension]
is handled by the production let_ident strict_binding, and let_ident is an identifier. So you can’t use an extension there.

  1. But there’s something more general that one can observe, in three parts:

a. lots of times the grammar and the AST don’t “line up” in the sense that subtrees of the AST don’t correspond to subtrees in the parse-tree. [hence “abstract” in “abstract syntax tree” grin]. This can mean that there’s no place in the grammar to put that “metaquotation”.

b. lots of times in the grammar, somebody thought to put “val_ident” but not allow for that to be an extension (so somebody could use a metaquotation to supply that part of the AST)

c. and this gets reflected in the AST itself: you see “string” in the AST at the point where that “val_ident” is supposed to be put, and not something that could allow an extension.

And finally d. the grammar is designed for parsing to the AST, and typically the authors of the grammar aren’t thinking about how to make it most commodious for metaquotations over the AST, too. So naturally, there will be many spots in the grammar where there is a single nonterminal, where you can’t put an extension or a metaquotation variable.

OK, so that’s the explanation. Now for the “attitude”. I’ve argued for a while that if you’re going to do metaquotations over an AST, you shouldn’t want to use the actual parser that is used to parse that syntax, and you shouldn’t want to use the actual AST either. Instead, you want to use a modified grammar, that parses to a modified AST. I’ve implemented this method for the OCaml AST in the package pa_ppx_parsetree for all the versions of the OCaml grammar 4.10-5.1, inclusive. For each, I took the grammar and the associated AST type, and did the “delta” of adding antiquotations eveywhere it seemed practical, in a systematic manner. It turns out to be very straightforward to maintain these alongside the original OCaml grammar and AST. And using the Camlp5 facility for automatically constructing metaquotations (or quasi-quotations) from a suitably modified AST type (the package pa_ppx_q_ast), we can automatically get the rest of the metaquotation machinery (the conversion from “AST type + antiquotations” to “expressions on the AST type”). Both these packages are released in opam, and I’ve used them below.

The modifications are to insert all the spots where you can use metavariables, so that you don’t have to rely on the existing identifiers of the language to serve as metavariables. So this is a great example of what I mean. Here’s your example:

# let loc = Location.none ;;
# let doit f = [%structure_item {| let $lid:f$ (type a) x = failwith "foo" |}] ;;
# Fmt.(pf stdout "%a" Pprintast.structure_item (doit "g")) ;;
let g (type a) x = failwith "foo"- : unit = ()

In “doit” there’s a metavariable “f” of type “string”. And this was done by modifying the grammar so that in the production for “let_ident” we can have either a string, or an antiquotation (like “$lid:f$”). And similarly, we modify the AST type so that a pattern-variable can be either a string or an antiquotation.

Here’s the details for “let_ident”: the type pattern_desc is modified thus:

  | Ppat_var of string Ploc.vala loc  (** A variable pattern such as [x] *)

and the grammar for let_ident:

%inline let_ident:
    val_ident_vala { mkpatvar ~loc:$sloc $1 }

and “val_ident_vala” is

%inline val_ident_vala:
   vala(val_ident, ANTI_LID) { $1 }
;

where vala is

%inline vala(X,anti):
   X
     { vaval $1 }
  | anti
     { vaant $1 }
;

and ANTI_LID is a new lexeme

"$" "lid:" ([^ '$']* as payload) "$"

Thanks for the response! :slight_smile:

With regards to the question 2: I’m asking if it’s possible to write a value for tdecl_type such that I can then use ppxlib.metaquot to write something like [%type: a [%t tdecl_type]]. I think the answer is “no”, but I feel confident that there are things one can do with this quotation library that I haven’t yet learned, so I thought I should check.

Can you clarify which snippet works for you? In my create_function_from_type snippet, I get a syntax error because, as Chet_Murthy indicates, there’s nowhere in the AST to attach the pattern. The last snippet in my post works fine for me (but doesn’t address point #2 above).

Again: this was mostly about making sure I wasn’t being naïve about how to write a PPX transformer in a modern context. Thanks for the help!

Hi! I really appreciate the thorough response. I’m generally in agreement with the attitude you’re describing. Put another way: one can (often quite comfortably) view syntactic metaprogramming markers as an extension to the existing language. If we adopt that perspective, then we should adopt it throughout the toolchain: there should be a parser, a parse tree data structure, etc. for the superlanguage and the metaprogramming operation should be a well-defined transformation from the superlanguage to the original language. I understand the purpose of ppxlib.metaquot — tools like Merlin can already read OCaml PPX syntax, so it’s less effort to build something on top of that — but I’m with you on the value of a well-defined superlanguage if the engineering resources are available. :slight_smile:

Your response has addressed my underlying concern: I was worried that I was using ppxlib incorrectly, as much of the documentation encourages the reader to avoid constructing ASTs by hand and to use ppxlib.metaquot whenever possible. I suspect that this advice is much more practical for simple code generators and local code modifications than it is for the sort of thing I’m currently writing. Thanks for the information and affirmation!

Well, i’m not arguing that effort needs to be applied throughout the tool chain: rather, I’m arguing that if you’re going to build something like metaquot then the way to do it is to actually bite the bullet and apply a systematic delta to both the grammar and the AST type.

Also though While metaquot might only be suitable for simple code generators and local code modifications (I’ve never used it, so I don’t know) the entire point of such quasi-quotation facilities Should be to allow the sort of extremely extensive AST hacking That you’re referring to. If it doesn’t work for the really complicated stuff, then what was the point of having built it?