This might be a bad time given the announcement of the road to 5.0, but I was wondering if there is any interest/ongoing work to provide an LLVM backend for OCaml.
I found this discussion:
There are repos, but they are all at least 8 years unattended.
My understanding is this would let OCaml tap into the optimizations available for LLVM IR and such. This is at the cost of migration, increased maintenance burden, and probably more. To echo @fyquah95 , “I do not have sufficient background to concretely point out what are the exact challenges.” I do not know the exact benefits either.
LLVM targets fewer architectures than OCaml, and apparently is a large contributor to Rust’s long compile times, so I think it’s unrealistic that it will ever be targeted by OCaml. WASM would probably be a better target overall.
WASM sounds like it could be interesting to target. Tangentially, Cranelift exists:
which I have heard anecdotally is better suited for JIT than LLVM (i.e. has better compile times, I’ve heard poor things about LLVM’s performance on that front), though unfortunately it is not stable yet.
Also, from my understanding Rust’s compile times are also due to its dependency management and other factors. Not sure how much LLVM plays into it, probably a fair bit.
The main obstacle to compiling OCaml via LLVM is GC support. OCaml’s GC needs some collaboration from the native-code generator in order to find all GC roots present in the stack. AFAIK, there was a nice design document for a LLVM API that would support this and work perfectly with OCaml, but it was never implemented. (This is all from memory; I would need to chase references.)
Without this API, the OCaml compiler could use a shadow stack to manage GC roots, without LLVM’s cooperation, but the resulting LLVM-compiled code would probably run slower than the ocamlopt-generated code.
Yes, this is the “nice design” I remembered. But I still don’t know how much of it is implemented today. In particular, I don’t know how to read the " Overview of available features" table. More information is welcome!
You and me both… I believe the top row describes garbage collection techniques (as described below the table), while the column describes GC features. The Done columns describes whether work on a feature is completed, and each GC technique has an X if it has that feature. No idea what the N’s mean, though.
edit: So the N’s might mean “NO”, but I’m not sure.
LLVM targets far more architectures than OCaml. OCaml has no back ends for things like the MSP430, or the Hexagon, or ARC, etc. etc. (I’m not arguing for an LLVM back end. I’m merely correcting a misconception.)
I’m not sure how the Haskell people do this, but the ghc LLVM backend seems fully functional, and they have quite similar issues on finding roots etc. (Again, not arguing for an LLVM back end as such.)
The question is a bit ambiguous because not all back ends are supported by all front ends, but the definitive list of back ends that are in the open source version is in the project sources.
Note that LLIR/Duplo somehow achieved this collaboration with OCaml’s runtime:
Across each call site, the OCaml garbage collector requires a descriptor identified through the
program counter to describe the set of locations that are live out of the call and contain heap roots
that must be rewritten after compaction or promotion to the major heap. […] A custom pass emits the required descriptor for each pseudo-instruction after all transformations have been applied. The
register allocator and several passes were modified to respect the semantics of heap pointers and to avoid incorrectly hoisting, altering or removing instructions operating on them.
But obviously, it comes at the expense of not using LLVM IR but LLIR.
Code in LLVM IR is target-dependent. Unlike JVM, porting to LLVM is actually porting to Windows x86-64 via LLVM, Linux Arm via LLVM, and so on. LLVM will not provide us dozen of targets for free.
You will probably want to implement a calling convention (in C++) for register pinning and such.
Alias analysis is another customizable component you may want to implement.
That said, a compiler backend for LLVM, or MLIR, has excellent potential: LTO, JIT compilation, and automatic vectorization, to name a few. Probably too heavy for maintainers and core developers with a lot of TODOs, but great for motivated external contributors?