Quasi-quotations for the OCaml AST and PPX rewriters

Preface: I’m not trying to sell you on Camlp5, and if you come away
with the idea of using Camlp5, please reconsider: I fully realize that
the OCaml community has settled on ppxlib, and I’m not trying to
change your minds on that. Rather, I’m trying to suggest ways of
making PPX rewriter authorship/maintenance much, much, MUCH easier.

Also: while the project described in this post uses a Camlp5-based PPX
rewriter to implement the machinery described, this machinery itself,
once complete, could be used to implement that PPX rewriter,
completely freeing it from Camlp5.

Why am I writing this post? I’m looking for people to collaborate
with, who would be willing to use this library to write PPX
rewriters using ppxlib. I’d support them in doing that, but I don’t
know ppxlib, and since I already have a well-developed PPX
infrastructure based on Camlp5, it’s of no use to me to learn it.
This library isn’t finished, but I’ve implemented quasi-quotations for
all of expression (with one exception) and that is enough proof to
me that I can do it for the entire OCaml AST type collection.

So if this looks interesting, and you write PPX rewriters already, I’d
like to collaborate.

TL;DR with pervasive quasi-quotations, we can implement the guts of
the show/pp type-deriver in … 124 lines. 124 lines, and no
actual references to any names (constructors/fields/etc) from
Parsetree. [OK, there’s one].

Quasi-quotation

This post is a discussion of quasi-quotation. Now, ppxlib already
has quotations, but those don’t allow you to do three things:

  1. give a name in your quotation to any arbitrary nonterminal in the
    grammar, so you can use it in patterns/expressions. E.g. in
[%type_decl.loc {| $list:tvl$ $lid:tname$ = $constructorlist:cl$ |}]

we want to take a type-declaration apart, pulling out the list of
parameters, and the list of constructor-declarations.

  1. In a production with a single child terminal/nonterminal,
    distinguish between the parent and child. Viz. pattern:
  | Ppat_var of string Ploc.vala loc

How can we distinguish between the pattern and the identifier in it? E.g. in

[%pattern {| ( $lid:l$ , $p2$ ) |}]

we want the pattern

{ppat_desc =
   Ppat_tuple
     [{ppat_desc =
         Ppat_var {txt = l'; loc = _};
       ppat_loc = _; ppat_loc_stack = _;
       ppat_attributes = []};
      p2'];
 ppat_loc = _; ppat_loc_stack = _;
 ppat_attributes = []}

and you can see that in that list, the first component is

{ppat_desc =
   Ppat_var {txt = l'; loc = _};
 ppat_loc = _; ppat_loc_stack = _;
 ppat_attributes = []}

but the second component is p2'

  1. use quotations on OCaml AST types that aren’t among the small
    number supported by Parse. For instance, on match-cases:
[%case {| $uid:cid$ { $list:patbinding_list$ } ->
             ((const string $string:cid$) ++ (const string " ") ++ (braces (fun pps ( $tuplelist:varpat_list$ ) -> $body$)))
             pps ( $tuplelist:varexp_list$ ) |}]

or the value-bindings that are part of a let-expression.

What might be nice, is if you could use quasi-quotation to
pattern-match on any value of expression without ever having to
reach down into the AST types. And if you could this for all the
types of the AST type collection.

This would mean two things:

  1. you wouldn’t have to memorize all sorts of names: constructors
    fields, etc, but instead, would just need to remember the OCaml
    syntax, and some of the grammar.

  2. As the OCaml AST changes, old PPX code that referenced old AST
    type-definitions can go obsolete and need to be updated. But if
    that code instead used quotations, often the changes to the AST
    types would be hidden inside the quotations, so code would continue
    working.

That’s the pitch. And again, I want to stress that this can be
achieved for the OCaml AST types (not some Camlp5 types), with the
current OCaml grammar (slightly modified), with only the use of a PPX
rewriter that is currently implemented using Camlp5.

Rather than describe in detail how it works, I thought I’d just give
you a somewhat-complex worked-example.

A little example: a show/pp type-deriver, implemented in a page-and-change

Below, I’ve included the “guts” of a show/pp type-deriver – the
code that takes the type-declaration and actually generates the
functions. It works on base types, constructor-datatypes, records,
parameterized types, and references to types from other modules, in
the way you would expect from using the show deriver from
ppx_deriving. The code works against the official OCaml AST
(5.0.0), because that’s what I used to implement the underlying
quasi-quotation mechanism.

From the project pa_ppx_parsetree (

) here’s the “guts” of a show/pp type-deriver, implemented in a
page-and-change (124 lines):

let loc_of_expression e = e.pexp_loc

let expr_unapplist e =
  let rec exrec e acc = match e with
      [%expression {| $e$ $list:l$ |}] -> exrec e (l@acc)
    | e ->  (e,acc)
  in exrec e []

let expr_applist __loc__ e l =
  let l = l |>  List.map (fun e -> (Nolabel,e)) in
  let (e,l0) = expr_unapplist e in
  [%expression {| $e$ $list:l0@l$ |}]

let rec core_type pfx = function
    [%core_type.loc {| ' $lid:tv$ |}] ->
     let pp_name = Fmt.(str "pp_param_%s" tv) in
     [%expression {| $lid:pp_name$ |}]

  | [%core_type.loc {| int |}] -> [%expression {| int |}]
  | [%core_type.loc {| bool |}] -> [%expression {| bool |}]
  | [%core_type.loc {| string |}] -> [%expression {| Dump.string |}]
  | [%core_type.loc {| $longid:li$ . $lid:tname$ |}] ->
     let pp_name = Fmt.(str "pp_%s" tname) in
     [%expression {| $longid:li$ . $lid:pp_name$ |}]

  | [%core_type.loc {| $lid:tname$ |}] ->
     let pp_name = Fmt.(str "pp_%s" tname) in
     [%expression {| $lid:pp_name$ |}]

  | [%core_type.loc {| $t$ option |}] ->
     let f = core_type pfx t in
     [%expression {| option ~none:(const string "None") ((const string "Some ") ++ $f$) |}]
  | [%core_type.loc {| $tuplelist:l$ |}] ->
     let (varpat_list,_, body) = core_type_tuple __loc__ pfx l in
     [%expression {| parens (fun pps ( $tuplelist:varpat_list$ ) -> $body$) |}]

and core_type_tuple __loc__ pfx l =
  let prefixes_types = l |> List.mapi (fun i t -> (Fmt.(str "%s_%d" pfx i), t)) in
  let fmtstring = l |> List.map (fun _ -> "%a") |> String.concat "," in
  let varpat_list = prefixes_types |> List.map (fun (id, _) -> [%pattern {| $lid:id$ |}]) in
  let varexp_list = prefixes_types |> List.map (fun (id, _) -> [%expression {| $lid:id$ |}]) in
  let pplist =
    prefixes_types
    |> List.concat_map (fun (id, t) -> [ core_type id t ; [%expression {| $lid:id$ |}] ]) in
  let body = expr_applist __loc__ [%expression {| pf pps $string:fmtstring$ |}] pplist in
  (varpat_list, varexp_list, body)

and constructor_decl = function
    [%constructor_declaration.loc {| $uid:cid$ |}] ->
     [%case {| $uid:cid$ -> const string $string:cid$ pps () |}]

  | [%constructor_declaration.loc {| $uid:cid$ of $list:tyl$ |}] ->
     let (varpat_list, varexp_list, body) = core_type_tuple __loc__ "_" tyl in
     [%case {| $uid:cid$ ( $tuplelist:varpat_list$ ) ->
             ((const string $string:cid$) ++ (const string " ") ++ (parens (fun pps ( $tuplelist:varpat_list$ ) -> $body$))) pps ( $tuplelist:varexp_list$ ) |}]

  | [%constructor_declaration.loc {| $uid:cid$ of { $list:fields$ } |}] ->
     let (patbinding_list, (varpat_list, varexp_list), body) = record_type __loc__ fields in
     [%case {| $uid:cid$ { $list:patbinding_list$ } ->
             ((const string $string:cid$) ++ (const string " ") ++ (braces (fun pps ( $tuplelist:varpat_list$ ) -> $body$)))
             pps ( $tuplelist:varexp_list$ ) |}]

and record_type __loc__ fields =
  let ids_types =
    fields
    |>  List.map (function [%field {| $mutable:_$ $lid:l$ : $typ:t$ $algattrs:_$ |}] ->
                    (l, t))  in
  let patbinding_list =
    ids_types |> List.map (fun (id,_) ->
                     let li = [%longident_t {| $lid:id$ |}] in
                     (Location.mkloc li __loc__,
                      [%pattern {| $lid:id$ |}])) in

  let varpat_list = ids_types |> List.map (fun (id, _) -> [%pattern {| $lid:id$ |}]) in
  let varexp_list = ids_types |> List.map (fun (id, _) -> [%expression {| $lid:id$ |}]) in

  let fmtstring = fields |> List.map (fun _ -> "%a") |> String.concat "; " in
  let pplist =
    ids_types
    |> List.concat_map (fun (id, t) ->
           let ppt = core_type "_" t in
           [ [%expression {| (const string $string:id$) ++ (const string " = ") ++ $ppt$ |}]
           ; [%expression {| $lid:id$ |}] ]) in
  
  let body = expr_applist __loc__ [%expression {| (pf pps $string:fmtstring$) |}] pplist in
  (patbinding_list, (varpat_list, varexp_list), body)

let type_decl = function
    [%type_decl.loc {| $list:tvl$ $lid:tname$ = $ty$ |}] ->
     let pp_name = Fmt.(str "pp_%s" tname) in
     let params = List.map (function ([%core_type {| ' $lid:v$ |}], _) -> Fmt.(str "pp_param_%s" v)) tvl in

     let f = core_type "_" ty in
     let rhs = [%expression {| fun pps x -> Fmt.(pf pps "%a" $f$ x) |}] in
     let rhs = List.fold_right (fun v rhs -> [%expression {| fun $lid:v$ -> $rhs$ |}]) params rhs in
     [%value_binding {| $lid:pp_name$ = $rhs$ |}]

  | [%type_decl.loc {| $list:tvl$ $lid:tname$ = $constructorlist:cl$ |}] ->
     let pp_name = Fmt.(str "pp_%s" tname) in
     let params = List.map (function ([%core_type {| ' $lid:v$ |}], _) -> Fmt.(str "pp_param_%s" v)) tvl in

     let branches = List.map constructor_decl cl in
     let rhs = [%expression {| fun pps ->  Fmt.(function $list:branches$) |}] in
     let rhs = List.fold_right (fun v rhs -> [%expression {| fun $lid:v$ -> $rhs$ |}]) params rhs in

     [%value_binding {| $lid:pp_name$ = $rhs$ |}]

  | [%type_decl.loc {| $list:tvl$ $lid:tname$ = { $list:fields$ } |}] ->
     let pp_name = Fmt.(str "pp_%s" tname) in
     let params = List.map (function ([%core_type {| ' $lid:v$ |}], _) -> Fmt.(str "pp_param_%s" v)) tvl in

     let (patbinding_list, _, body) = record_type __loc__ fields in
     let rhs = [%expression {| Fmt.(braces (fun pps { $list:patbinding_list$ } -> $body$)) |}] in
     let rhs = List.fold_right (fun v rhs -> [%expression {| fun $lid:v$ -> $rhs$ |}]) params rhs in

     [%value_binding {| $lid:pp_name$ = $rhs$ |}]

let top_si si = match si with
    [%structure_item.loc {| type $nonrecflag:rf$ $list:tdl$ |}] ->
    let bindings = List.map type_decl tdl in
    [si ; [%structure_item {| let $recflag:rf$ $list:bindings$ |}]]

let top l = List.concat_map top_si l 

The output of the program

And it works: here are types and the generated pretty-print functions:

type tint = int
let rec pp_tint pps x = let open Fmt in pf pps "%a" int x
type tstring = string
let rec pp_tstring pps x = let open Fmt in pf pps "%a" Dump.string x
type ('a, 'b) choice =
  | Left of 'a 
  | Right of 'b 
let rec pp_choice pp_param_a pp_param_b pps =
  let open Fmt in
    function
    | Left (__0) ->
        (((const string "Left") ++ (const string " ")) ++
           (parens (fun pps -> fun (__0) -> pf pps "%a" pp_param_a __0))) pps
          (__0)
    | Right (__0) ->
        (((const string "Right") ++ (const string " ")) ++
           (parens (fun pps -> fun (__0) -> pf pps "%a" pp_param_b __0))) pps
          (__0)
type ('a, 'b) pair = ('a * 'b)
let rec pp_pair pp_param_a pp_param_b pps x =
  let open Fmt in
    pf pps "%a"
      (parens
         (fun pps ->
            fun (__0, __1) -> pf pps "%a,%a" pp_param_a __0 pp_param_b __1))
      x
type t1 = (int * bool * t2 option)
and t2 = {
  f: string ;
  g: bool }
and t3 =
  | C 
  | D of int * int 
  | E of {
  h: int ;
  j: string } 
and t4 = string
let rec pp_t1 pps x =
  let open Fmt in
    pf pps "%a"
      (parens
         (fun pps ->
            fun (__0, __1, __2) ->
              pf pps "%a,%a,%a" int __0 bool __1
                (option ~none:(const string "None")
                   ((const string "Some ") ++ pp_t2)) __2)) x
and pp_t2 =
  let open Fmt in
    braces
      (fun pps ->
         fun { f; g } ->
           pf pps "%a; %a"
             (((const string "f") ++ (const string " = ")) ++ Dump.string) f
             (((const string "g") ++ (const string " = ")) ++ bool) g)
and pp_t3 pps =
  let open Fmt in
    function
    | C -> const string "C" pps ()
    | D (__0, __1) ->
        (((const string "D") ++ (const string " ")) ++
           (parens
              (fun pps -> fun (__0, __1) -> pf pps "%a,%a" int __0 int __1)))
          pps (__0, __1)
    | E { h; j } ->
        (((const string "E") ++ (const string " ")) ++
           (braces
              (fun pps ->
                 fun (h, j) ->
                   pf pps "%a; %a"
                     (((const string "h") ++ (const string " = ")) ++ int) h
                     (((const string "j") ++ (const string " = ")) ++
                        Dump.string) j))) pps (h, j)
and pp_t4 pps x = let open Fmt in pf pps "%a" Dump.string x
5 Likes