I’m curious about something, and thought I’d ask what people thought. It’s an involved explanation, but there is a real question at the bottom, which is basically “is this too complicated and involved?”
Background
Some background: there’s a PPX rewriter called ppx_metaquot. It takes the text of some OCaml AST item, wrapped in an extension tag, viz.
[%type: bool * char]
or
[%type: [%t? b] * [%t? c]]
and it expands that to the corresponding AST. From the documentation, there’s one big problem with it: for anything that isn’t a spot where an extension can go, you can’t easily do antiquotations. The example they give is: in
[%str module M = struct ... end][@subst let M : string = s]
there, the idea is, you want to use the contents of the variable s
for the module-name. And you can’t really specify this directly in OCaml’s AST, so you’re … stuck.
A Different Approach
The problem here is that we’re assuming we cannot modify the OCaml AST. But what if we could? Imagine that we have a parallel OCaml AST type family that differs in the following manner from the official one:
Suppose we have a type
type 'a vala = VaVal of 'a | VaAnt of string
so in a spot like that module-binding, which is
| Pstr_module of module_binding (** [module X = ME] *)
....
and module_binding =
{
pmb_name: string option loc;
pmb_expr: module_expr;
pmb_attributes: attributes;
pmb_loc: Location.t;
}
you’d just change the type of the field (notice the vala
below)
pmb_name : string vala option loc
Now, you modify the grammar in a similar manner, and invent some syntax for antiquotations that doesn’t conflict with OCaml syntax today, and … you can write (notice the raw-string, b/c this isn’t parseable by the official OCaml parser, but by our modified parser):
[%str {|module $uid:s$ = struct ... end|}]
And then we can emit either OCaml pattern or expression syntax for the official AST.
And this works for literally every spot in the grammar where there is a nonterminal producing a value that is directly installed into some location in the AST. So you can do:
[%typ {| $list:l$ $longid:t$ |}]
Advantages
You can use surface-syntax-based pattern-matching to access (and build) far more of the AST (with antiquotations) than you can do in an approach that doesn’t change the grammar. OCaml’s AST changes a bit from version-to-version, but over a series of versions it changes enough that it makes updating PPX rewriters a chore. If most of the work of accessing the AST were via meta-quotation, then a lot of that updating would be done automatically … by the updated metaquotation PPX rewriter. That’s going to make it much easier to maintain rewriters.
Also, it’s just easier to write PPX rewriters when using meta-quotation, b/c you don’t actually have to know the details of all the AST types – just the surface syntax, and where various antiquotations go. This makes writing PPX rewriters much, much, MUCH, MUCH easier. I mean really a lot easier.
Limitations
What are the limitations of this technique? Well, it seems that things like attributes are stored (for key AST types: expression, pattern, core_type, a few others) in a record, and some of the fields of the record do not correspond to nonterminals in the grammar. So you can’t pull out attributes, without pulling out other things that aren’t constructible.
[concretely, if you want the attributes of an expression, you might want to write [%exp {| $e$ $attrs:l$ |}]
but the antiquotation e
doesn’t correspond to anything in the grammar. Of couse, that’s fixable, but at the price of changing the grammar in ways that aren’t part of what I’ll describe below.]
Costs
What’s the cost of this approach?
- make a copy of most of
OCAMLSRC/parsing
- scatter these
vala
annotations around inparsetree.mli
,longident.mli
and maybe a few other files. - add some injection and projection functions in some of the code of the grammar and helper functions.
- and modify the grammar to use these antiquotations.
- last, of course, write the code to convert this “pattern parsetree” into patterns and expressions. But that’s entirely outside of these changes, and in any case it can be implemented as a PPX deriver.
The nice thing is, you can erase the above annotations (the “vala”, and projection/injection functions) and recover the original parser. So in principle, these changes are checkable and easily-maintainable going forward.
The Question
OK. So now to my question: Is this a means for achieving better meta-quotation, that would be interesting? Or is the requirement to make (what I would consider anodyne) changes to the parser and support code (in a copy, but nevertheless) a deal-breaker?