PPX interface spec (with ocamlc/ocamlopt)?

I searched a bit, and couldn’t find a documented specification for how ocamlc/ocamlopt/dune (and perhaps ocamlfind) invoke PPX rewriters. Is this documented someplace? Failing that, I’m going to reverse-engineer it from the code, but it sure would be nice if there were some specification.

Anybody know of one?

P.S. what I mean is, the precise way in which the compiler will invoke these PPX rewriter executables, with what arguments, and what those arguments mean. It would be nice to know the same for “staged_pps” rewriters.

OCaml - Native-code compilation (ocamlopt) says:

-ppx command
After parsing, pipe the abstract syntax tree through the preprocessor command. The module Ast_mapper, described in chapter 29: Ast_mapper , implements the external interface of a preprocessor.

The linker chapter starts with:

The interface of a -ppx rewriter

A -ppx rewriter is a program that accepts a serialized abstract syntax tree and outputs another, possibly modified, abstract syntax tree. This module encapsulates the interface between the compiler and the -ppx rewriters, handling such details as the serialization format, forwarding of command-line flags, and storing state.

Not sure if this is enough detail or not.


Regarding dune’s staged_pps, I think each one of those might be a separate -ppx argument or something. Normally with ppxlib (and possibly even ppx_deriving), all ppx-s are compiled into a single executable that calls all the ppx rewriters or derivers.

To be frank, it’s not a spec. But that’s OK: I’m reverse-engineering it.

It appears that the interface of a PPX rewriter executable is

EXECUTABLE file1 file2

the serialized AST is found in file1 (either as an interf or impl – I haven’t figured out how you indicate the difference between the two – maybe it reads the magic number) and the result of PPX rewriting is written to file2, again as a serialized AST.

While you can use -ppx (and -pp) to invoke a preprocessor, it is not necessary (and has the downside of going through the shell).

In fact the compiler drivers ocamlc and ocamlopt will happily accept a file containing a marshalled AST instead of source code as argument. What Dune does is to build a single ppx.exe executable that bundles all PPX preprocessors in a single pass. This tool takes as input a source code file foo.ml and outputs a marshalled AST foo.pp.ml. After producing foo.pp.ml, Dune invokes the compiler with this file as argument.

Thus, the compiler never does any preprocessing on its own, all the preprocessing logic is managed by Dune and encoded in the ppx.exe helper executable.

Cheers,
Nicolas

You can use -intf and -impl to tell the compiler if the file you are passing corresponds to an interface or an implementation.

Cheers,
Nicolas

1 Like

Ah, I meant the PPX rewriter executable. I tried with each of a .ml and a .mli file, and the invocation of the executable is the same. I conclude that it’s using the magic number to decide what it’s getting.

Not sure I follow; source code does not contain magic numbers. The PPX driver ppx.exe accepts the same -intf and -impl flags as the compiler to choose between interface and implementation. Absent that, it simply looks at the extension of the file (.ml or .mli).

The PPX driver logic is provided by ppxlib: https://github.com/ocaml-ppx/ppxlib/blob/main/src/driver.ml.

Cheers,
Nicolas

Here’s the trace of commands of ocamlc on ML and MLI files. The script “PPX” is:

#!/bin/bash -x

SRC=$1
DST=$2
/home/chet/Hack/Opam-2.1.2/GENERIC/5.0.0/lib/ppx_here/./ppx.exe --as-ppx $SRC $DST
cp $SRC saved.src
cp $DST saved.dst

So it seems that the PPX driver is not passed -impl and -intf args.

ocamlc on an ML file:

 ocamlfind ocamlc -verbose -package fmt,camlp5 -ppx ./PPX -c foo.ml
Effective set of compiler predicates: pkg_fmt,pkg_compiler-libs,pkg_compiler-libs.common,pkg_camlp-streams,pkg_camlp5,autolink,byte
+ ocamlc.opt -verbose -ppx ./PPX -c -I /home/chet/Hack/Opam-2.1.2/GENERIC/5.0.0/lib/fmt -I /home/chet/Hack/Opam-2.1.2/GENERIC/5.0.0/lib/ocaml/compiler-libs -I /home/chet/Hack/Opam-2.1.2/GENERIC/5.0.0/lib/camlp-streams -I /home/chet/Hack/Opam-2.1.2/GENERIC/5.0.0/lib/camlp5 foo.ml
+ ./PPX '/tmp/camlppx309261' '/tmp/camlppxbd5172'

ocamlc with an mli file:

+ ocamlfind ocamlc -verbose -package fmt,camlp5 -ppx ./PPX -c foo.mli
Effective set of compiler predicates: pkg_fmt,pkg_compiler-libs,pkg_compiler-libs.common,pkg_camlp-streams,pkg_camlp5,autolink,byte
+ ocamlc.opt -verbose -ppx ./PPX -c -I /home/chet/Hack/Opam-2.1.2/GENERIC/5.0.0/lib/fmt -I /home/chet/Hack/Opam-2.1.2/GENERIC/5.0.0/lib/ocaml/compiler-libs -I /home/chet/Hack/Opam-2.1.2/GENERIC/5.0.0/lib/camlp-streams -I /home/chet/Hack/Opam-2.1.2/GENERIC/5.0.0/lib/camlp5 foo.mli
+ ./PPX '/tmp/camlppxce9c6c' '/tmp/camlppxd1307e'

P.S. PPX rewriters are not passed source, right? They’re passed serialized ASTs, and return the same.

I was explaining what Dune does (which does not use -ppx). Dune passes source code as input to the PPX rewriter.

On the other hand, the mechanism used if you pass -ppx to the compiler indeed uses marshalled ASTs for input and the magic number to know whether it is an interface or an implementation.

Cheers,
Nicolas

Just to emphasize: there are significant build speed advantages to the approach based on building a separate ppx driver. First, if there are multiple rewriters, they get combined into a single pass over the parsetree in the ppxlib driver approach. That has mattered a lot in my experience. Second, when the preprocessed parsetree is written to a file, it can be tracked as an intermediate result by the build system. Depending on what you are doing, reusing that work can be meaningful.

3 Likes

My current little project is to add “ocaml.ppx.context” support to the invocation path to Camlp5. I agree that combining PPX rewriters into a single executable (and when possible, the smallest number of passes) is a good thing. Camlp5 already does the former (and always has); the latter is a matter of topological sort and some clever merging.

I’d like to reuse the result of preprocessing, but that requires being able to tease apart findlib packages into the part used by the preprocessor, and the part used by the compiler (ocamlc/opt). That’s a little too complicated to do in a Makefile.