[ANN] ppx_format

I happy to announce the first release of ppx_format.

Its a small ppx rewriter that was first written at the mirage retreat in 24 with @PizieDust , and that allows to put values in the middle of format strings:

let s = "World"
let x = 123
let () = Format.printf {%i|Hello {%s s} {%a Format.pp_print_char % Char.chr 65} {%d x}%!|}

Its compatible with any function that takes format strings. The only constraint is that the format string has to be the last argument.

I have used at in some of my projects, and it will be available on opam as soon as the release PR is merged.

Would it be possible to make the syntax like this: Format.printf {|Hello %s{s}|}?

Fun coincidence: at the last Mirage retreat I discussed format interpolation with @octachron, and I wrote the beginning of a RFC for it (and then promptly forgot about it). I just pushed it at …/format-interpolation.md.

You might consider a different interpolation syntax (viz. GitHub - camlp5/pa_ppx_fmtformat: A PPX rewriter to provide string-interpolation, using Fmt as the underlying mechanism )

The advantage of the suggested format for interpolated expressions over “%{…%}” is that with the multiple forms, there’s never a need to escape – you can just pick a different form.

====

The simplest interpolated expression is of the form $(...) but all of the following are accepted:

  • $(...), $(|...|)
  • $[...], $[|...|]
  • ${...}, ${|...|}
  • $<...>, $<|...|>

So basically, ‘$’ followed by any of [ ‘(’, ‘[’, ‘{’, ‘<’ ],
optionally ‘|’, and then at the end, the matching text. Between these
8 forms, it should be possible to enclose any interpolated expression
without difficulty, I would think.

In the text surrounded by these delimiter, anything other than the
end-string is acceptable, and there is no provision made for escaping.

The contents of the interpolated expression can be of three forms:

==== interpolated expression with format-specifier: $( <expression> | <format-specifier> )

an interpolated expression of the form $(abc|%d) specifies that the
expression abc will be formatted with %d. So {%fmt_str|a $(abc|%d)|} expands to
Fmt.(str "a %d" abc).

==== interpolated expression with Fmt formatter: $( <expression> | <Fmt formatter expression> )

an interpolated expression of the form $(abc|int) specifies that the
expression abc will be formatted with the Fmt formatter int. So {%fmt_str|a $(abc|int)|} expands to
Fmt.(str "a %a" int abc).

==== interpolated expression without specifier/formatter: $( <expression> )

an interpolated expression of the form $(abc) specifies that the
expression abc will be formatted with %s. So {%fmt_str|a $(abc)|} expands to
Fmt.(str "a %s" abc).

Implementation wise it should be pretty easy. However I use it quite a bit in other project so I would prefer to not change the syntax. I think if you really want it this it should be easy to maintain a fork

Interesting indeed. On the implementation side, my version basically has a second lexer.

I dont have provision for escaping either. My idea is that if you want something like “{%` in your expression, you can always bind that expression to a variable and use the variable. I don’t think its very nice to have very complex expressions inside the format string.

In my opinion, using {% in the format string is more of an issue than using it in the expressions, and its not solved by your proposed syntax.

Also the idea to have special syntax for %a that does not have %a in it might be good, but my goal with this ppx was to provide maximum familiarity: someone used by printf/format can read code written with my ppx and immediately guess what is happening correctly.

Emile,

First, your rationale is excellent, and I understand your reasoning. I cannot fault it. I would add, though, that when one can start using the equivalent of [from Perl5]

"abc $foo def $bar"

in OCaml, you start using more and more of the stuff, assembling more and more complicated strings containing interpolated variables. And then you get to where you want an interpolated -expression- …

"abc ${\( f($foo) )} def $bar"

and you’re off to the races. That idiom ${\( …)} is a “scalar context” within a string – you can put any scalar expression in there. And at that point, you can write rather complex structural expressions, that are all one big printf.

It was when I realized I could do this in Perl5 back in 1995, that I became a Perl bigot.

All of this is way to try to express that while surely Printf-style format-strings do not encourage complexity and depth in the expressions one writes, when you can start doing interpolation, and esp. interpolation with -expressions-, at least some people will find it irressistible.

And I do think that it’s more readable to do things this way, than to use loops/iterators and such to write the same thing without interpolation.

All that said, for sure I understand your position, and given your design goal, I can’t dispute your decision.

I think thats probably quite good, but in OCaml, you get degraded tooling inside the format string: merlin is spotty, there is no ocamlformat, no syntax highlighting. So in my opinion its better to outsource complex logic to a variable above. If such expressions-in-strings were a first-class citizen of language, like I am guessing they are in perl, it would be quite different.

:smile: it was great watching you build this