Hashconsing an AST via PPX

Chet_Murthy · April 18, 2020, 9:28pm

[up-front (so nobody gets the wrong idea): I’m not pushing Camlp5.
Rather, I’m just noting that this sort of thing is really easy to do,
and I encourage someone to do something similar using the PPX
infrastructure.]

I didn’t want to derail the “Future of PPX” thread, so I thought I’d
post separately to answer ivg@ 's issue about hashconsing of ASTs
using PPX. It’s actually [uh, I think] really, really easy to
implement hashconsing of ADTs, using a PPX extension. On a lark, I
decided to do it today, and while the code I’ve got isn’t sufficient
to use, I think it’s not very far away, and I have the perfect
use-case already in-mind. It took me two hours to implement the
rewriter and the testcase, on top of the other infrastructure, which
has no support for hashconsing of any sort.

Here are some examples of data-types and functions that are
automaticaly hash-consed. The idea is that in the pattern-match the
pattern is annotated with a variable (in this example, “z”); the
expression that is supposed to be hash-consed against that pattern is
annotated with that same variable. [The code that descends to the
expression is a little weak right now, but I think that’s easily
fixable.] The algorithm goes as follows:

(1) “decorate” the pattern with “as z_” variables everywhere
in constructors. This allows us to refer to parts of the original
value.

(2) then find each expression that is marked with that same varable.
Structurally descend the pattern and the expression in parallel and
generate code to compare sub-structure and hashcons where appropriate.

And that’s really it. I’m sure this can be implemented using the PPX
tools.

Some comments: (1) what’s nice, is that we can just take already-written
code like List.map and annotate it; that generates a hash-consed
version. And since the generated code never uses deep structural
equality (only pointer-equality) it should be only marginally slower
than the original implementation.

(2) The variable in the annotation (“z”) is used as the base for
generating a whole slew of fresh variables, and I don’t bother (yet)
to check for clashes; this (again) is straightforward, but hey, I
started two hours ago.

type t = Leaf of int | Node of t * int * t

module HCList = struct

let rec map f = function
    [][@hashrecons z] -> [][@hashrecons z]
  | (a::l)[@hashrecons z] -> let r = f a in ((r :: map f l)[@hashrecons z])

end

let deep =
let rec deep = (function
  Leaf n[@hashrecons z] -> Leaf n[@hashrecons z]
| Node (l, n, r) [@hashrecons z] -> 
  Node (deep l, n, deep r) [@hashrecons z]
  )
[@@ocaml.warning "-26"]
in deep


type sexp = 
  | Atom of string 
  | List of sexp list

let sexp_deep =
  let rec deep = function
      Atom s[@hashrecons z] -> Atom s[@hashrecons z]
    | List l[@hashrecons z] -> List (HCList.map deep l)[@hashrecons z]
  in deep

Links: First, at the commit, so they won’t change

the testcase file: https://github.com/chetmurthy/pa_ppx/commit/5dd6b2ef3ca3677e11a0ad696074200101bd661f#diff-e6dffe78fc6c27bdffa41970c4a7f1ca

the “ppx rewriter”: https://github.com/chetmurthy/pa_ppx/commit/5dd6b2ef3ca3677e11a0ad696074200101bd661f#diff-24aeaf51366017948f5735727f001c85

Second, the files with human-readable names, etc.:

the testcase: https://github.com/chetmurthy/pa_ppx/blob/master/tests/test_hashrecons.ml

the “ppx rewriter”: https://github.com/chetmurthy/pa_ppx/blob/master/pa_hashrecons/pa_hashrecons.ml

The projects:

chetmurthy/pa_ppx: A reimplementation of ppx_deriving, all its plugins, ppx_import, and a
few others.

chetmurthy/camlp5: Camlp5, version pre-8.00 on which the above is based. This is on the branch 26.attempt-pa-deriving .

Kakadu · April 19, 2020, 10:33am

I experimented with this some time ago for ML workshop.
The idea was to provide function: t -> htbl -> htbl * t which rewrites value of type t by removing equal subtrees. Essentially it is just a fold over data type.

github.com

kakadu/GT/blob/master/regression/test816hash.ml#L74


    }


  let hash c = c.plugins#hash
end


(* Interesting part goes here *)


type expr = Const of GT.int | Binop of expr * expr
[@@deriving gt ~options: { hash } ]


(* reuses old (Const 5) when accessing another (Const 5) *)
let (_,_) =
  let h = GT.H.create () in
  (* [GT.hash typ] takes a value and return its copy where all equal subtrees
   * have the same location in memory
   *)
  GT.hash (expr) h (Binop (Const 5, Const 5))

Chet_Murthy · April 19, 2020, 10:36am

If you wanna use a hashtable (and, I presume, Obj.magic) you can write a single function that does the trick for all immutable data-types, right? But if the goal is merely to re-share unchanged data after a single pass over an AST, you don’t need to go that far.

Kakadu · April 19, 2020, 11:16am

Yeah, probably you are right.
But the idea was to reuse existing framework where you can override transformation behaviur per constructor.

jjb · April 19, 2020, 11:31am

Yes, we have some magic @mbouaziz code in Infer that does this to create as much sharing as possible as values are Marshaled out.

XVilka · April 22, 2020, 5:16am

Could you provide an end user minimal example in the README.md on how to use these extensions e.g. with pa_deriving.enum and Dune-based “hello world”?

Chet_Murthy · April 24, 2020, 10:14pm

Hi, I just pushed the code here: chetmurthy/pa_ppx .

There’s a README that explains how

to build and use the code. Right now, it requires a pre-release version of camlp5, and I’m not sufficiently well-versed in the opam repository package-lifecycle to know how I could make this version of camlp5 available as an opam package without disrupting the already-released version.
how to use the package with Make.
how to use the package with dune.

For the latter, I include a patch for the yara-ocaml package’s dune file which suffices.

I’d be very interested in any feedback, howsoever elementary. Things that seem obvious to me aren’t obvious at all to others, so even just build/install problems are worth reporting.

Topic		Replies	Views
Bootstrapping our way to Hashconsing and quotations with PPX Rewriters Ecosystem ppx	13	1611	October 10, 2020
Update on the big ppx refactoring project Community ppx	6	2094	September 30, 2019
Looking for suggestions for PPX rewriters that need creating Ecosystem ppx	2	1012	April 21, 2020
[ANN] ppx_viewpattern initial release Community announce , ppx	0	859	March 14, 2022
PPX transformation from a Typed AST Community ppx	6	1553	October 11, 2020

Hashconsing an AST via PPX

Related topics