[Discussion] Writing a transpiler from PHP to polyglot PHP+C code

So it’s possible to write polyglot code that both compiles in C and runs in PHP, with a couple of tricks and helper functions. Example code here: pholyglot/polyglot.c at main · olleharstedt/pholyglot · GitHub

Now I’m thinking about writing a transpiler that will eat a subset of modern PHP code, and spit out polyglot PHP+C code. Of course using Menhir and OCaml. :slight_smile: :slight_smile:

Use-case:

  • Performance in my snippet is 400% faster in the C version than PHP, despite being “same” code, so it could be a way to optimize parts of a code base, without people having to learn a new language (I chose testing input/ouput instead of numerical calculations, because obviously C will be faster in the latter case).

Challenges:

  • Associative array in PHP has to go, instead do C arrays, linked lists and “normal” hashtables with string keys.
  • Typing, but can probably be solved by requiring arguments to be typed in functions, and infer types of variables based on usage.
  • Inheritance and interfaces, I’d have to test some more here… AFAIK, you can do OOP in C by manually writing a vtable; interfaces would just be void*, I guess? Classes without methods can be compiled to structs easily.
  • Memory management. Just don’t collect…? And make sure the program is short, lol. Or slap Boehm GC on it.
  • Partial compilation in C meets include files in PHP. Dunno how to solve yet. Maybe just glue everything together as one file upon compilation.

This is similar to Facebook’s project to compile PHP to C++, but they were hindered by the fact that they had to support the full language (or 99%). By applying only a subset, I think you can squeeze out a much higher performance gain.

Why compile to polyglot code instead of just C or C++? To smooth out the transition between two languages or ecosystems, maybe? And lower the threshold of commitment; you can easily “rollback”, in a sense.

Maybe more related to programming language design than OCaml itself, but, there you go. :blush:

didn’t FB then abandon the C++ target, and instead develop a nearly-PHP language with runtime (HipHop VM – HHVM) that admitted progressive typing? I thought FB hired a bunch of OCaml hackers to do that work?

They did abandon the C++ transpiler Hiphop, because the performance gain was not that good. The virtual machine they made after is sadly not that good either, compared to later versions of PHP. The difference to my attempt is:

  1. Subset of PHP might open some performance doors that were not available for the C++ transpiler
  2. Polyglot output to lessen the commitment, or cost of rolling back to vanilla PHP
  3. Target short-lived scripts (don’t collect memory, or only Boehm GC)

For Hack, it’s actually only the type checking that’s coded in OCaml. The compiler backend is still C++. I think. Or is the VM? Not sure. :blush:

Made a small prototype here, very standard thing: pholyglot/src at main · olleharstedt/pholyglot · GitHub

Parser and lexer in Menhir, AST that represents the subset PHP lang, then I’d have to iterate over it to infer some types, transform to polyglot AST and from there to string.

The one thing to make it more professional would be proper error messages for the end user… But you have to carry file and line in the AST, right? Maybe I can google around. :thinking: