Ppx deriving enum

I do not intend to sound entitled / complaining here. Genuine questions:

  1. Is this https://github.com/ocaml-ppx/ppx_deriving/blob/v5.2.1/src_plugins/enum/ppx_deriving_enum.cppo.ml the correct code for ppx deriving enum ? Or is it the output of the macro ? It is quite a bit more verbose than I expected.

  2. Assuming the above is hand written: Is this the expected level of verbosity of writing deriving macros ? I do not want to sound like a complainer here, but this macro would be much much shorter in Clojure / Scheme.

  3. Besides camlp4/camlp5, are there efforts to make this shorter ?

Thanks!

1 Like
  1. Yes, that’s the code implementing [@@deriving enum]. PPXs transform the OCaml AST by looking at the AST of the type and inserting AST for some additional value/function definitions. OCaml’s AST is quite large and there are many distinct constructs (e.g. variants and polymorphic variants are completely separate things that such a PPX needs to handle). Compared to other PPXs, enum is still quite small.

  2. Yes, that’s hand-written. Depending on what you’re trying to do, a PPX likely is going to require even more code.

    I’m not familiar with Lisp macros, but if they’re just sexp transformations, then it’s not surprising that they’d be much simpler because OCaml’s AST is far more complex than just sexps.

  3. There are various attempts at that:

    1. GitHub - janestreet/ppx_type_directed_value: Get [@@deriving]-style generation of type-directed values without writing a ppx.
    2. GitHub - thierry-martinez/refl: OCaml PPX deriver for reflection and other runtime types/reflection libraries.
    3. GitHub - sim642/ppx_easy_deriving is my recent (unreleased) attempt of doing something simpler with no performance compromises.
2 Likes

Two alternative approaches that might interest you:

  1. The corresponding code from the old Camlp4-based deriving extension:
    https://github.com/jaked/deriving/blob/master/syntax/enum_class.ml

  2. Safe and efficient generic functions with MacoCaml (pdf), an outline of a talk at OCaml 2023 that shows how to write these kinds of functions using some new OCaml features (compile-time let bindings and typed code quotations) that we’re developing.

MetaOCaml is the closest in spirit to lispy macros I’ve seen. It lags the official release a bit (up to 4.14 at this point, it looks like), and editor support is a very open question, but it works.

This looks outstanding (and of little surprise, explicitly inspired by MetaOCaml). Thank you for linking!

May I ask, what are the origins of the “MacoCaml” name?

The old Camlp4 code is (of course) dead. But another alternative is the Camlp5-based pa_ppx code, which implements many of the same standard derivers, as well as other PPX rewriters. Here’s the enum PPX deriver (there are a bunch of derivers in the same directory):

Obviously, unless you’re already using Camlp5, these aren’t a good choice, b/c Camlp5-based PPX rewriters don’t cooperate/interoperate with the standard PPX infrastructure.

What exactly does “don’t cooperate/interoperate” mean ?

Within a single [@@deriving ...] in a single *.ml file, can I use bot camlp5-ppx and normal-ppx ?

If so, where is the limitation; if not, when can the two be used together ?

Dumb question: how do I get MacoOCaml ? I tried googling for it’s github repo, but all I’m finding are pdfs, no actual repo.

Looks like this modular-macros · GitHub (linked from the paper).

Quoting the readme:

This is an experimental implementation of OCaml macros, based on version 4.04 of OCaml.

Am I reading this correctly? It brings us back to 4.04 …

It is a matter of sequencing.

  1. the standard PPX infrastructure is driven from the OCaml AST, after parsing by the OCaml parser. I’ll refer to this as ppxlib.

  2. the Camlp5-based PPX infrastructure is driven from Camlp5’s MLast, which is parsed by the Camlp5 parser. This happens before an AST is handed to OCaml. I’ll refer to these as pa_ppx.

So the actual series of events is:

a. text goes to Camlp5’s parser, which builds MLast (in Camlp5’s runtime)
b. transformers (if any), e.g. pa_ppx rewriters are executed (again, within Camlp5’s runtime)
c. finally, Camlp5’s MLast is converted into OCaml’s AST and squirted into a file
d. then ocamlc is invoked on that file (again, containing serialized OCaml AST)
e. then the OCaml compiler can invoke PPX rewriters (ppxlib)
f. finally the result of that is again a file of OCaml AST, which the OCaml compiler processes in the normal way.

So if you have a PPX extension/attribute that is completely unrecogniized by pa_ppx, then later on it can be processed by ppxlib. But in the case of derivers, that cannot happen, b/c @@deriving is itself an attribute, and if you had [@@deriving a,b] then either both a and b need to be pa_ppx derivers or both need to be ppxlib derivers. B/c if pa_ppx loads the deriving rewriter, then it will process all deriver declarations (a and b).

Hope that makes sense.

P.S. This is pretty-much baked-in from the beginning, since Camlp5 is parse-time, and PPX is -after-parse-time.

1 Like

I don’t get the sense that anyone is supposed to use that implementation in anger, thus the “experimental” admonition. Its purpose was to demonstrate the utility of the design presented in Modular Macros (2015), which seems to be part of MacoOCaml’s lineage. Per the “future work” section of the 2023 paper:

In the future we plan to extend the formalism to support OCaml’s
full module system, including functors and signatures with abstract
types and subtyping, and to bring the implementation to a state where
it can be merged into the main OCaml distribution.

So…maybe there will be a surprise at OCaml 2023? :wink:

This explains alot. Thank you.

I am still a bit confused on this. Let us consider:

// shared-expr
type t = ... [@@deriving pa_ppx_foo normal_ppx_bar]
// shared-file
// A.ml:
type t = ... [@@deriving pa_ppx_foo];;
type t2 = ... [@@deriving normal_ppx_bar]
// shared-project
// A.ml:
type t = ... [@@deriving pa_ppx_foo];;
// B.ml
type t2 = ... [@@deriving normal_ppx_bar]

Based on what you stated above, I am convinced that shared-expr is not allowed.

However, is shared-file or shared-project allowed ?

[Before reading all this, please keep at the front of your find that there are rewriters and derivers. A deriver is a plugin to the @@deriving rewriter.]

Remember that @@deriving is an attribute. The fact that it has a payload is a matter for the PPX rewriter that processes it. The only thing that the infrastructure (either pa_ppx or ppxlib) does, is to decide whether any particular instance of an attribute, should be handled by some particular rewriter it has loaded. So in your example, there are three instances of @@deriving. They’re going to be handed-off by pa_ppx/ppxlib to some code, and that’s going to happen without looking at the payload, right? Or at least, that’s how I implemented it. Then, the code for @@deriving will look at the payload and figure out which derivers to invoke.

So either pa_ppx has loaded a PPX rewriter for @@deriving, or it has not. If it has, then that rewriter is going to try to process these attributes. If no such rewriter has been loaded, then these attributes will be passed unmodified to ppxlib, and if ppxlib has loaded a rewriter for @@deriving, then that rewriter can process these attributes.

From this, you can see that if both infrastructures load rewriters, then the first one (pa_ppx) gets a crack, and however it processes the attribute, is how it will be processed.

BTW, my understanding of the spec for @@deriving is that when you have an attribute

type t = ... [@@deriving a,b]

then it’s supposed to be expanded to

.... code generated by the deriver ...
[@@@end]

Or at least, that’s how I remember it. And it’s how I implemented it. I have a vague memory that ppxlib`` does not implement this "rewrite the attribute" (notice the "_inline") but maybe it does -- I rarely use or look at ppxlib` rewriters/derivers except when I’m doing compatibility testing.

In any case, if this is the way things are supposed to be expanded, then clearly, only one @@deriving rewriter can get a whack at the @@deriving attribute.

You asked about other work to make these PPX rewriters shorter. I’ve posted about this before ( Quasi-quotations for the OCaml AST and PPX rewriters ), but since then, (a) did a ton more work, and (b) got bored and wandered off before releasing code. But (c) now that OCaml 5.1.0 is coming out, I’m updating everything to work with that. So …

The punchline first: suppose you look at ppx_jsobject_conv and observe that “wow, it’s complicated”. I would argue that that’s right, b/c it doesn’t use quasi-quotations. But that’s fixable, and at the bottom of this note, I have a link to a project where I rewrote the guts of ppx_jsobject_conv with quasi-quotations, and it’s much, much more readable. At least, to me. I mean, I get that there are people who think “pervasive quasi-quotations” are useless or ugly or whatever.

So something you could do (or I could do, but I doubt anybody would be interested in it, so why would I bother wasting the effort?) would be to take the enum deriver you pointed-at, and rewrite it using pa_ppx_parsetree. Just to show how much more transparent it is. But hey, I already did that wiith the “show” deriver (pointer below) and nobody was interested, so … ah, nevertheless [h/t Atrios].

Now, the short explanation: I’ve been working on something called pa_ppx_parsetree for a while: GitHub - camlp5/pa_ppx_parsetree: Tooling for doing things with OCaml's official AST

What is it? It’s a package that provides something like ppx_metaquot, but for every major syntactic category and pretty much every parsing rule of the OCaml grammar, and does so for every version of the OCaml grammar from 4.10.0 thru 5.1.0, inclusive. So what do I mean?

pattern-match longidents: [%longident_t {| $longid:li1$. $lid:l$ |}]

pattern-match the list of cases in a match: [%expression {| match $e1$ with $list:cases$ |}]

pattern-match a case-branch: [%case {| $p1'$ when $e1'$ -> $e2'$ |}]

and of course, all of these work as expressions, too. [there’s some trickery done to get the locations to work without too much fuss].

pa_ppx_parsetree has an example: a “show” deriver (though not hooked-into any infrastructure): https://github.com/camlp5/pa_ppx_parsetree/blob/master/tests/test_show.ml

and I’ve also used it to rewrite the guts of ppx_jsobject_conv with quasi-quotations everywhere: https://github.com/chetmurthy/ppx_jsobject_conv/blob/experimental-parsetree-quotation-hacking/src2/code.orig.ml

In that project:

  1. it’s on the branch experimental-parsetree-quotation-hacking
  2. in the directory src2 (src is the original deriver, so I can compare)
  3. it’s set up so I can build it against the OCaml AST of any version 4.10-5.0 inclusive (haven’t tested it against 5.1). The idea being, sure/sure/SURE/SURE it’s nice to fix a single version of the AST and use omp to migrate to it, run your rewriter, and migrate back. But it’s not necessary, and this demonstrates that for this PPX deriver, you don’t have to do it: the code compiles with any version of the OCaml AST.

Okay, so the key question is “did pa_ppx load a ppx rewriter” ? What determines that? Is it a line like:

ocamlfind ocamlc -package camlp5,fmt -syntax camlp5o \
    -linkpkg streams.ml -o streams

If that is the case, it implies that this happens at a file granularity:

  1. for every file, either all @@deriving is handled by pa_ppx; or all @@deriving is handled by normal ppx

  2. for different files in the same dune project, it is possible one file uses pa_ppx for @@deriving and another file uses normal_ppx for @@deriving

Are the two above assertions correct ?

Thanks!

The typical way in which Camlp5-based packages are invoked, is via ocamlfind, just as you describe. So it happens at whatever granularity your build-system allows, and in any case, the finest granularity is the file-level.

I don’t use dune, but the GT project does (use dune) and uses both ppxlib and Camlp5. There, they generate preprocessor executables using mkcamlp5/mkcamlp5.opt and then use those to preprocess their files before handing them to ocamlc.

Sorry, I’m not really answering your question (as to those two assertions) b/c I don’t use dune. I use Makefiles, and yes, it’s straightforward to control on a file-by-file basis the findlib packages that are loaded for a particular source file’s compilation. Which amounts to saying that for file a.ml I can choose to load Camlp5 packages, and for file b.ml in the same directory, I can choose not to.

1 Like

I think the gist of everything makes sense now. Thanks for your patience in answering all my followup questions!