[up-front (so nobody gets the wrong idea): I’m not pushing Camlp5.
Rather, I’m just noting that this sort of thing is really easy to do,
and I encourage someone to do something similar using the PPX
I didn’t want to derail the “Future of PPX” thread, so I thought I’d
post separately to answer ivg@ 's issue about hashconsing of ASTs
using PPX. It’s actually [uh, I think] really, really easy to
implement hashconsing of ADTs, using a PPX extension. On a lark, I
decided to do it today, and while the code I’ve got isn’t sufficient
to use, I think it’s not very far away, and I have the perfect
use-case already in-mind. It took me two hours to implement the
rewriter and the testcase, on top of the other infrastructure, which
has no support for hashconsing of any sort.
Here are some examples of data-types and functions that are
automaticaly hash-consed. The idea is that in the pattern-match the
pattern is annotated with a variable (in this example, “z”); the
expression that is supposed to be hash-consed against that pattern is
annotated with that same variable. [The code that descends to the
expression is a little weak right now, but I think that’s easily
fixable.] The algorithm goes as follows:
(1) “decorate” the pattern with “as z_” variables everywhere
in constructors. This allows us to refer to parts of the original
(2) then find each expression that is marked with that same varable.
Structurally descend the pattern and the expression in parallel and
generate code to compare sub-structure and hashcons where appropriate.
And that’s really it. I’m sure this can be implemented using the PPX
Some comments: (1) what’s nice, is that we can just take already-written
List.map and annotate it; that generates a hash-consed
version. And since the generated code never uses deep structural
equality (only pointer-equality) it should be only marginally slower
than the original implementation.
(2) The variable in the annotation (“z”) is used as the base for
generating a whole slew of fresh variables, and I don’t bother (yet)
to check for clashes; this (again) is straightforward, but hey, I
started two hours ago.
type t = Leaf of int | Node of t * int * t module HCList = struct let rec map f = function [@hashrecons z] -> [@hashrecons z] | (a::l)[@hashrecons z] -> let r = f a in ((r :: map f l)[@hashrecons z]) end let deep = let rec deep = (function Leaf n[@hashrecons z] -> Leaf n[@hashrecons z] | Node (l, n, r) [@hashrecons z] -> Node (deep l, n, deep r) [@hashrecons z] ) [@@ocaml.warning "-26"] in deep type sexp = | Atom of string | List of sexp list let sexp_deep = let rec deep = function Atom s[@hashrecons z] -> Atom s[@hashrecons z] | List l[@hashrecons z] -> List (HCList.map deep l)[@hashrecons z] in deep
Links: First, at the commit, so they won’t change
Second, the files with human-readable names, etc.:
the “ppx rewriter”: https://github.com/chetmurthy/pa_ppx/blob/master/pa_hashrecons/pa_hashrecons.ml
chetmurthy/pa_ppx: A reimplementation of ppx_deriving, all its plugins, ppx_import, and a
chetmurthy/camlp5: Camlp5, version pre-8.00 on which the above is based. This is on the branch 26.attempt-pa-deriving .