I’m hacking OCaml backed targeting Golang, currently in PoC phase and unpublished. So far I decided to build on top of https://github.com/jordwalke/rehp by @jordwalke. Rehp is a fork of js_of_ocaml, that adds a layer of generalized intermediate representation between generate.ml
and actual backend producing code in target language. In upstream js_of_ocaml this is hard-wired to produce JS-specific tree right from generate.ml
. As far as I know, @hhugo was not opposed to upstream some of work in Rehp into js_of_ocaml.
As of now Rehp got a bit stale, it’s still based on jsoo 3.6 while 3.8 is already out. Also the fact that bytecode is untyped involves some pain when targeting statically typed language, so I wanted to share some thoughts on what’s happening in OCaml backend landscape and get some feedback/ideas from the community to understand what direction is the best nowadays for alternative backend authors.
Golang is flexible enough so that one can write fully dynamic code like in JavaScript, and that’s how I managed to get my backend to support what Rehp emits. But performance is abysmal, basically I have to box everything including integers so that I have one single dynamic type (interface{}
) representing any OCaml value. Unfortunately I can’t do the same trick with integers and pointers that OCaml is using in native target, Golang GC is not happy about that, and crashes when it sees pointer which is actually an integer.
I tried to type the bytecode, at least partially, read some whitepapers on gradual type systems, but so far have not achieved any particular success (partially because of lack of time, partially because I’m not really proficient on the subject and had no idea what I’m doing). Not sure if this is at all solvable given only OCaml bytecode, as OCaml memory representation is lisp-style and highly dynamic in nature.
I know @EduardoRFS also tried to type the bytecode, but IIRC with additional type annotations ingested in bytecode, probably coming from typedtree? Given the necessity to have FFI with target language, probably some additional annotations would have to be passed from source representation down to the bytecode to specify concrete types in target language for abstract types used at OCaml level (i.e. I have type t
in OCaml with some raw macro style bindings for that type, but in Golang I want it to be *bytes.Buffer
so that I don’t have to wrap it in dynamic value container and do type assertions at runtime).
On the other hand @ostera seems to be working with compiler libs and typedtree directly in Caramel (Erlang backend). Original whitepaper on js_of_ocaml suggests typedtree is fragile, while bytecode is stable, and it’s better to use bytecode for project to be sustainable. Is that still true nowadays? Or typedtree is stable enough to rely on it?
There are obviously more radical choices like forking OCaml compiler altogether, like what BuckleScript is doing, but it looks too involved to be feasible.
Would be awesome to hear back from compiler hackers