[up-front (so nobody gets the wrong idea): I’m not pushing Camlp5.
Rather, I’m just noting that this sort of thing is really easy to do,
and I encourage someone to do something similar using the PPX
infrastructure.]
I didn’t want to derail the “Future of PPX” thread, so I thought I’d
post separately to answer ivg@ 's issue about hashconsing of ASTs
using PPX. It’s actually [uh, I think] really, really easy to
implement hashconsing of ADTs, using a PPX extension. On a lark, I
decided to do it today, and while the code I’ve got isn’t sufficient
to use, I think it’s not very far away, and I have the perfect
use-case already in-mind. It took me two hours to implement the
rewriter and the testcase, on top of the other infrastructure, which
has no support for hashconsing of any sort.
Here are some examples of data-types and functions that are
automaticaly hash-consed. The idea is that in the pattern-match the
pattern is annotated with a variable (in this example, “z”); the
expression that is supposed to be hash-consed against that pattern is
annotated with that same variable. [The code that descends to the
expression is a little weak right now, but I think that’s easily
fixable.] The algorithm goes as follows:
(1) “decorate” the pattern with “as z_” variables everywhere
in constructors. This allows us to refer to parts of the original
value.
(2) then find each expression that is marked with that same varable.
Structurally descend the pattern and the expression in parallel and
generate code to compare sub-structure and hashcons where appropriate.
And that’s really it. I’m sure this can be implemented using the PPX
tools.
Some comments: (1) what’s nice, is that we can just take already-written
code like List.map
and annotate it; that generates a hash-consed
version. And since the generated code never uses deep structural
equality (only pointer-equality) it should be only marginally slower
than the original implementation.
(2) The variable in the annotation (“z”) is used as the base for
generating a whole slew of fresh variables, and I don’t bother (yet)
to check for clashes; this (again) is straightforward, but hey, I
started two hours ago.
type t = Leaf of int | Node of t * int * t
module HCList = struct
let rec map f = function
[][@hashrecons z] -> [][@hashrecons z]
| (a::l)[@hashrecons z] -> let r = f a in ((r :: map f l)[@hashrecons z])
end
let deep =
let rec deep = (function
Leaf n[@hashrecons z] -> Leaf n[@hashrecons z]
| Node (l, n, r) [@hashrecons z] ->
Node (deep l, n, deep r) [@hashrecons z]
)
[@@ocaml.warning "-26"]
in deep
type sexp =
| Atom of string
| List of sexp list
let sexp_deep =
let rec deep = function
Atom s[@hashrecons z] -> Atom s[@hashrecons z]
| List l[@hashrecons z] -> List (HCList.map deep l)[@hashrecons z]
in deep
Links: First, at the commit, so they won’t change
the testcase file: https://github.com/chetmurthy/pa_ppx/commit/5dd6b2ef3ca3677e11a0ad696074200101bd661f#diff-e6dffe78fc6c27bdffa41970c4a7f1ca
the “ppx rewriter”: https://github.com/chetmurthy/pa_ppx/commit/5dd6b2ef3ca3677e11a0ad696074200101bd661f#diff-24aeaf51366017948f5735727f001c85
Second, the files with human-readable names, etc.:
the testcase: https://github.com/chetmurthy/pa_ppx/blob/master/tests/test_hashrecons.ml
the “ppx rewriter”: https://github.com/chetmurthy/pa_ppx/blob/master/pa_hashrecons/pa_hashrecons.ml
The projects:
chetmurthy/pa_ppx: A reimplementation of ppx_deriving, all its plugins, ppx_import, and a
few others.
chetmurthy/camlp5: Camlp5, version pre-8.00 on which the above is based. This is on the branch 26.attempt-pa-deriving .