Hygienic macros for OCaml

I’ve been thinking about how to cleanly implement some form of hygienic macros for OCaml. Specifically, I’m thinking about how to implement something like Rust’s macro_rules. The problem there, is that the objects the macros manipulate are “token-trees” – streams of tokens, structured into trees at “()”, “”, “{}”. Now, it would make sense to stick to syntax that fits with OCaml’s PPX, so one would want there to be some form of PPX extension, that specified that the payload was a token-tree stream. Today, the payloads can be: structures, signatures, patterns, types. Strings/expressions enter as structures.

None of these work for hygienic macros implemented in the style of Rust: they require that the result of macro-expansion be re-parsed, and it can easily be the case that the syntactic category of a token-tree argument to a macro (is “a * b” a type, or an expression?) cannot be inferred until that final parse. So it would seem that one would need a new kind of payload for PPX extensions – a token-tree. Viz.

[%foo! (a, b,c)]

where the “!” indicates that the payload is a token-tree stream bounded by that closing “]”.

One might ask “why not use a string payload” as in

{%foo|(a,b,c)|} (which is shorthand, if I’m remembering the syntax right, for [%foo {|(a,b,c)|}]

and the answer is that in nested macro-invocations, we would want to rename any hygienically-created identifiers, but that will be difficult if they are embedded in string payloads. It would involve repeated parsing/renaming/stringification. A big mess, and hard to keep location information consistent (for debugging).

In Camlp5, I can sidestep all of this, b/c I can extend the syntax in arbitrary ways, but obviously if hygienic macros are going to be useful, they’ll need to be implemented with the current PPX infrastructure; hence my question to you all.

I’d welcome any feedback.

4 Likes

You might be interested to know that we’ve recently restarted work on the modular macros work in Cambridge, with the aim of completing and formalising the design and bringing the implementation to an upstreamable state.

Modular macros are computations that hygienically construct code from typed fragments. and integrate smoothly with other OCaml features such as modules. The details are rather different from Rust’s macros, but perhaps they’ll support the use cases you’re interested in. There are some examples in the extended abstract, such as the typed printf function that builds a printer from a format description:

macro rec printk : type a b. (string expr -> b expr) -> (a, b) fmt -> a expr =
  fun k -> function
  | Int -> << fun s -> $(k <<string_of_int s>>) >>
  | Lit s -> k (lift_string s)
  | Cat (l, r) -> printk (fun x ->
                  printk (fun y -> k << $x ˆ $y >>) r) l

and some larger examples in @otini’s macros-examples repository, such as a port of the strymonas stream fusion library.

4 Likes

Jeremy,

I think it’s worth looking at the Rust model for macro-writing, independently of hygiene (which is important – just, it’s not all that’s interesting about Rust’s model). I attach a function that one uses to format/write out a complex number. Here’s a use: write_complex!(f, "", "", self.re, self.im, T) (where f is a formatter object).

This is a lot of code, and complex code, and yet, you basically write it as if you’re just writing plain Rust. And this model works for a remarkable number of different kinds of macros. That’s powerful, and a model that I think might be valuable for OCaml.

Also, for the example you adduce (implementing printf), The way Rust does it, is to combine simple macros, and modular-impliciits. Again, remarkably powerful. Remarkably powerful.

macro_rules! write_complex {
    ($f:ident, $t:expr, $prefix:expr, $re:expr, $im:expr, $T:ident) => {{
        let abs_re = if $re < Zero::zero() {
            $T::zero() - $re.clone()
        } else {
            $re.clone()
        };
        let abs_im = if $im < Zero::zero() {
            $T::zero() - $im.clone()
        } else {
            $im.clone()
        };

        return if let Some(prec) = $f.precision() {
            fmt_re_im(
                $f,
                $re < $T::zero(),
                $im < $T::zero(),
                format_args!(concat!("{:.1$", $t, "}"), abs_re, prec),
                format_args!(concat!("{:.1$", $t, "}"), abs_im, prec),
            )
        } else {
            fmt_re_im(
                $f,
                $re < $T::zero(),
                $im < $T::zero(),
                format_args!(concat!("{:", $t, "}"), abs_re),
                format_args!(concat!("{:", $t, "}"), abs_im),
            )
        };

        fn fmt_re_im(
            f: &mut fmt::Formatter<'_>,
            re_neg: bool,
            im_neg: bool,
            real: fmt::Arguments<'_>,
            imag: fmt::Arguments<'_>,
        ) -> fmt::Result {
            let prefix = if f.alternate() { $prefix } else { "" };
            let sign = if re_neg {
                "-"
            } else if f.sign_plus() {
                "+"
            } else {
                ""
            };

            if im_neg {
                fmt_complex(
                    f,
                    format_args!(
                        "{}{pre}{re}-{pre}{im}i",
                        sign,
                        re = real,
                        im = imag,
                        pre = prefix
                    ),
                )
            } else {
                fmt_complex(
                    f,
                    format_args!(
                        "{}{pre}{re}+{pre}{im}i",
                        sign,
                        re = real,
                        im = imag,
                        pre = prefix
                    ),
                )
            }
        }

        #[cfg(feature = "std")]
        // Currently, we can only apply width using an intermediate `String` (and thus `std`)
        fn fmt_complex(f: &mut fmt::Formatter<'_>, complex: fmt::Arguments<'_>) -> fmt::Result {
            use std::string::ToString;
            if let Some(width) = f.width() {
                write!(f, "{0: >1$}", complex.to_string(), width)
            } else {
                write!(f, "{}", complex)
            }
        }

        #[cfg(not(feature = "std"))]
        fn fmt_complex(f: &mut fmt::Formatter<'_>, complex: fmt::Arguments<'_>) -> fmt::Result {
            write!(f, "{}", complex)
        }
    }};
}

ETA: in this case, the macro is pretty simple and doesn’t do any pattern-matching. But lots of macros do pattern-matching and deal with repeated arguments and such. It’s a lot like syntax-rules in Scheme (I think that’s the one – might be one of the other ones that has “…” for repeated arguments).

Sorry for off top, but what does

hygienic

mean in OCaml/FP/language design context?

It refers to the same idea from Scheme: Hygienic macro - Wikipedia

In short, if the expansion of a macro introduces bound variables, then those bound variables must not inadvertently capture free variables. A typical way of achieving this is that bound variables introduced in the macro’s expansion are always chosen to be fresh via some gensym-like method.

There’s more requirements, but really, that’s the most important one.

Hope this helps.

1 Like