Can I write a ppx to replace `1` with `(add_to_stack 1)`?

,

The use case is embedding a (type-safe) Forth DSL in OCaml. So

1 2 +

should be expanded by a ppx extension to

(add_to_stack 1) (add_to_stack 2) (word +)

(Or possibly replace + with just plus.)
Is this possible? I know basically nothing about ppx, sorry.
Everything in Forth is a “word” except numbers, which are automatically added to the implicit stack. Since I want to encode the stack with phantom types, numbers too must be represented by functions in OCaml.

1 Like

The problem is that you have to be able to parse the token-stream. So instead of 1 2 +, would you be willing to accept 1 2 (+) ? If so, then sure you should be able to do it pretty easily. I’d guess that you could modify ppx_here to do it. e.g. expand [%forth 1 2 (+)] into what you want.

1 Like

You can embed arbitrary languages in OCaml using PPX and quoted strings, eg

{forth| 1 2 + |forth}

You can learn more about PPX preprocessors at An introduction to OCaml PPX ecosystem | Tarides.

Cheers,
Nicolas

3 Likes

Thanks, will read up on ppx_here!

Similar project with stack: Michelson: the language of Smart Contracts in Tezos — Tezos (master branch, 2022/10/21 17:06) documentation

Example:

module Forth : sig
    type 's t
    type empty = unit
    type empty_t = unit t
 
    val start : unit -> unit t
    val num : int -> 'a t -> ([`number] * 'a) t
    val plus : ([`number] * ([`number] * 'a)) t -> ([`number] * 'a) t
    val dot : ([`number] * 'a) t -> 'a t
end = struct
    type 's t = unit
    type empty = unit
    type empty_t = unit t
 
    let start () = ()
    let num x s = s
    let plus s = s
    let dot s = s
end
 
let _ =
    let open Forth in
    start () |> num 1 |> num 2 |> plus |> dot

What I want is the last line to look like

1 2 + .

instead, or at least

1 2 plus dot

Possible…?

Link to ppx_here: GitHub - janestreet/ppx_here: Expands [%here] into its location

Or maybe a variadic function would work better, like

let forth = <variadic function def> in
forth 1 2 plus dot (* <-- Expands to num 1 |> num 2 |> plus |> dot *)

The problem is that [%ext ... ] must be valid OCaml syntax, which 1 2 plus dot is not (“1” cannot be applied). Using {%ext| ... |} feels less embedded and more tacked on, and I’d have to write a separate parser (trivial for Forth tho).

Olle,

I got bored, so coded up something that might resemble what you’re looking for. It … took a few minutes, so, y’know, if it’s not useful, not much lost.

Link: GitHub - chetmurthy/pa_ppx_forth: a little Camlp5-based PPX rewriter for Forth syntax

To use this, you need to have camlp5, pa_ppx, and not-ocamlfind installed. At that point, you should be able to just run “make” in the toplevel directory. I haven’t bothered to make an opam file, b/c … well, read on.

  1. test/ppx_forth.ml has a test which should, I hope, look like what you’re asking for. The expansion of
[%forth 1 2 plus dot;]

is

(((start () |> num 1) |> num 2) |> plus) |> dot

which is, I believe, what you were looking for.

  1. If one wanted to instead have a custom syntax, that is to say, to use {%forth| 1 2 + .|} instead, it wouldn’t be much more code: just hack together a little parser. If one wanted the lexemes of this little language to be different from those of OCaml, it would need a little lexer, which isn’t much more work.

  2. BUT BUT BUT all of this is based on Camlp5. And I’m sure not pushing it on anybody, b/c sure, I recognize that it’s not compatible with the main thrust of OCaml development, and anybody who uses Camlp5, is buying into incompatibility.

That said, hey, this is a demonstration that you can do what you want with PPX extensions, b/c everything I did with Camlp5, you can do with PPX extensions – there’s no magic here.

So: look at pa_forth.ml, and you’ll see the rewriter that matches expression PPX-extensions and rewrites them in the manner (I think) you desire. This should be straightforward (albeit tedious and (IMHO) painful) to do with the standard PPX rewriter infrastructure.

P.S. If one wanted to hack together a little parser and/or little lexer, there are examples in the Camlp5 tutorials ( camlp5/tutorials at master · camlp5/camlp5 · GitHub ) of just that, albeit not combined with PPX rewriters – but that’s not very difficult to do. Again though, all that stuff is dependent on Camlp5, and I’m not suggesting that anybody use it, b/c “nonstandard”.

2 Likes

Oh wow, I’m speechless, nicely done. Will look at it tomorrow, it’s late now. :heart: Also join the IRC for discussions, btw (libera, not freenode).

Can this be used interactively via OCaml command-line environment? Maybe it could actually be fed into Forth then, if type-checked.

Two answers:

  1. If you write a PPX rewriter using the standard machinery, yes, you can use it from a toplevel (which is what I assume you mean by “commandline environment”).

  2. the Camlp5-based pa_ppx PPX rewriters can all be loaded into a toplevel environment and used there also.

I don’t know what you mean by “fed inito Forth”, but if you can explain further, I might be able to answer. In any case, unless you want to use Camlp5 (which … well, sure I think it’s great stuff, but I’m not going to push it on anybody, b/c it’s clearly not the direction chosen for OCaml) it would seen like the next step really is to learn how to replicate this little rewriter using the standard PPX libraries and machinery.

Thanks!

I’m mostly still brainstorming on a conceptual level to figure out if it’s at all possible.

Maybe the imagine pipeline would be something like

Syntax extension --> type-checked OCaml DSL --> string of Forth code --> Unix.exec forth_string

So that the generated Forth code will be evaluated by, say, gforth. You’d get something like a type-checked interactive Forth. Is that clear?

Edit: Found a PDF talk on type-inference for stack languages, which includes a Haskell example using tuples (no syntax extension). https://raw.githubusercontent.com/nuprl/hopl-s2017/master/type-inference-for-stack-languages/talk-notes.pdf

Another discussion thread on adding types to Forth: Adding a "type" checker to Forth · Issue #79 · ForthHub/discussion · GitHub

1 Like

If the only reason you’re wanting to generate some sort of Forth-in-OCaml, is for typechecking, can I suggest that it might be easier to just write a typechecker for Forth ? (grin)

I already have another compiler project going on, I’m not allowed to switch to another… :no_mouth: Duct taping together something workable is allowed tho… Kinda.

Edit: Reminder to self that Forth includes multiple stacks (data stack, return stack, float-point stack, …) and also has both interpretation mode and compilation mode, e.g. when defining a new word with : and ;.

Edit 2: Forth has ' to get the “execution token” of any word, which can be used together with exec to run that word. It’s used for high-level programming like a list map. Not sure how that would be typed in above system using phantom types.

I just wanted to clarify a point: you can pick whatever syntax you want as long as it can be parsed to an OCaml AST, which can get rewritten to a valid OCaml program. The intermediate AST does not need to type check.

Concretely: 1 2 plus dot is valid OCaml syntax. It’s the syntax of a function call (Pexp_apply) where the “function” expression is 1 and the 3 arguments are 2, plus, dot.

You’re right that it’s an invalid OCaml program, because it does not typecheck, but ppx rewriting happens before that. It’s common to pick some syntax that happens to map to the OCaml parse tree but is not valid ocaml. For example:

  • in the attribute [@@deriving map,show], the map,show part is a tuple. map is an identifier (even if map is not in scope).
  • in ppx_sexp_message the syntax used in the example is "string" ~source:(tmpfile : string) but of course a string isn’t a function that takes labelled arguments
  • in ppx_custom_printf the syntax used is !"string" but it does not correspond to an actual (!) operator - and actually you can’t use !s with a non-literal string s.
3 Likes

TIL those quoted string literals!

Also in that article:

There is a wide variety of PPX rewriters but the ones you’ll probably see the most are Extensions and Derivers.

I’ve played with ppxlib a little bit and the docs don’t really have info about anything other than Extensions and Derivers.

Re-reading now they mention:

It is also possible to write more advanced transformations such as rewriting constants that bear the right suffix, rewriting function calls based on the function identifier or to generate code from items annotated with a custom attribute but we won’t cover those in this section.

and

Note that you can also write arbitrary, whole AST transformations with ppxlib but they don’t have a clear composition semantic since they have to be applied sequentially as opposed to the other, better defined rewriting rule.

Does anyone have pointers to an entry-point for how to implement either of those?

You may have a look to src/context_free.mli in ppx-lib source tree.

For “arbitrary, whole AST transformations”, you may look to the optional arguments ?impl, ?intf, ?preprocess_impl, ?preprocess_intf in Driver.register_transformation.

1 Like