OCaml LLVM backend?

Yam76 · November 5, 2021, 8:43pm

This might be a bad time given the announcement of the road to 5.0, but I was wondering if there is any interest/ongoing work to provide an LLVM backend for OCaml.

I found this discussion:

There are repos, but they are all at least 8 years unattended.

My understanding is this would let OCaml tap into the optimizations available for LLVM IR and such. This is at the cost of migration, increased maintenance burden, and probably more. To echo @fyquah95 , “I do not have sufficient background to concretely point out what are the exact challenges.” I do not know the exact benefits either.

yawaramin · November 5, 2021, 9:43pm

LLVM targets fewer architectures than OCaml, and apparently is a large contributor to Rust’s long compile times, so I think it’s unrealistic that it will ever be targeted by OCaml. WASM would probably be a better target overall.

Yam76 · November 5, 2021, 11:13pm

WASM sounds like it could be interesting to target. Tangentially, Cranelift exists:

which I have heard anecdotally is better suited for JIT than LLVM (i.e. has better compile times, I’ve heard poor things about LLVM’s performance on that front), though unfortunately it is not stable yet.

Yam76 · November 5, 2021, 11:22pm

Also, from my understanding Rust’s compile times are also due to its dependency management and other factors. Not sure how much LLVM plays into it, probably a fair bit.

yawaramin · November 6, 2021, 2:00am

Depends who you ask, on the forum posts I’ve seen Steve Klabnik largely attributes the compile times to LLVM

silene · November 6, 2021, 5:59am

There are some more recent projects, such as Duplo/LLIR: GitHub - nandor/llir-opam-repository: Opam repository for Duplo-Optimised OCaml

xavierleroy · November 6, 2021, 6:44pm

The main obstacle to compiling OCaml via LLVM is GC support. OCaml’s GC needs some collaboration from the native-code generator in order to find all GC roots present in the stack. AFAIK, there was a nice design document for a LLVM API that would support this and work perfectly with OCaml, but it was never implemented. (This is all from memory; I would need to chase references.)

Without this API, the OCaml compiler could use a shadow stack to manage GC roots, without LLVM’s cooperation, but the resulting LLVM-compiled code would probably run slower than the ocamlopt-generated code.

Yam76 · November 7, 2021, 4:55pm

I found these two links:
https://llvm.org/docs/GarbageCollection.html#the-erlang-and-ocaml-gcs
https://llvm.org/docs/Statepoints.html

I thought the first one was interesting because they specifically mention producing a binary format modelling the OCaml compiler.

xavierleroy · November 7, 2021, 5:14pm

Yes, this is the “nice design” I remembered. But I still don’t know how much of it is implemented today. In particular, I don’t know how to read the " Overview of available features" table. More information is welcome!

Yam76 · November 7, 2021, 5:49pm

You and me both… I believe the top row describes garbage collection techniques (as described below the table), while the column describes GC features. The Done columns describes whether work on a feature is completed, and each GC technique has an X if it has that feature. No idea what the N’s mean, though.
edit: So the N’s might mean “NO”, but I’m not sure.

perry · November 7, 2021, 11:07pm

LLVM targets far more architectures than OCaml. OCaml has no back ends for things like the MSP430, or the Hexagon, or ARC, etc. etc. (I’m not arguing for an LLVM back end. I’m merely correcting a misconception.)

perry · November 7, 2021, 11:09pm

I’m not sure how the Haskell people do this, but the ghc LLVM backend seems fully functional, and they have quite similar issues on finding roots etc. (Again, not arguing for an LLVM back end as such.)

yawaramin · November 8, 2021, 12:30am

Out of curiosity, can you point me to the reference list of LLVM’s supported architectures?

perry · November 8, 2021, 1:36am

The question is a bit ambiguous because not all back ends are supported by all front ends, but the definitive list of back ends that are in the open source version is in the project sources.

kit-ty-kate · November 8, 2021, 2:13am

You can get the full list here: llvm-project/llvm/lib/Target at main · llvm/llvm-project · GitHub. At this time in the main branch:

AArch64
AMDGPU
ARC
ARM
AVR
BPF
CSKY
Hexagon
Lanai
M68k
MSP430
Mips
NVPTX
PowerPC
RISCV
Sparc
SystemZ
VE
WebAssembly
X86
XCore

Unless I’ve missed something, everything that OCaml can target, LLVM also can (arm, x86, powerpc, s390x/Z, riscv)

silene · November 8, 2021, 5:08am

Note that LLIR/Duplo somehow achieved this collaboration with OCaml’s runtime:

Across each call site, the OCaml garbage collector requires a descriptor identified through the
program counter to describe the set of locations that are live out of the call and contain heap roots
that must be rewritten after compaction or promotion to the major heap. […] A custom pass emits the required descriptor for each pseudo-instruction after all transformations have been applied. The
register allocator and several passes were modified to respect the semantics of heap pointers and to avoid incorrectly hoisting, altering or removing instructions operating on them.

But obviously, it comes at the expense of not using LLVM IR but LLIR.

XVilka · November 10, 2021, 7:13am

Yes, see this code for the reference:

Kakadu · July 24, 2023, 10:35am

I grepped the sources of LLVM 16 and found a few relevant places.

https://github.com/llvm/llvm-project/blob/release/16.x/llvm/lib/IR/BuiltinGCs.cpp
The declaration of OCaml-like (based on 3.10) GC. It requires safe points and ‘metadata’ to work properly
https://github.com/llvm/llvm-project/blob/release/16.x/llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp
An emitter of OCaml frame table described here. No idea is it enough to be similar.

Probably, I need to check more GC features to make better evaluation. Could somebody name them?

omasanori · July 24, 2023, 12:25pm

Some remarks:

Code in LLVM IR is target-dependent. Unlike JVM, porting to LLVM is actually porting to Windows x86-64 via LLVM, Linux Arm via LLVM, and so on. LLVM will not provide us dozen of targets for free.
You will probably want to implement a calling convention (in C++) for register pinning and such.
Alias analysis is another customizable component you may want to implement.

That said, a compiler backend for LLVM, or MLIR, has excellent potential: LTO, JIT compilation, and automatic vectorization, to name a few. Probably too heavy for maintainers and core developers with a lot of TODOs, but great for motivated external contributors?

chris-armstrong · June 2, 2025, 6:47am

An interesting article popped up recently demonstrating a single pass LLVM (-O0) backend for fast machine code generation from LLVM IR.

(maybe not entirely relevant to this discussion, but interesting from the perspective of simplifying compiler backends and making them fast, a philosophy which seems to be shared by the OCaml community)

Topic		Replies	Views
LLVM Backend for OCaml Ecosystem compiler , build , llvm	12	8899	November 19, 2017
Finding a maintainable, sustainable build system for the LLVM bindings + Dune currently doesn't meet the package's specific needs Ecosystem llvm , dune , ocamlfind	25	1533	October 2, 2023
Proposal: care more about OCaml bindings for popular libraries Ecosystem	7	1442	February 22, 2023
OCaml-Java compiler Ecosystem ocaml , java , vm	3	3246	January 22, 2018
Taking Inventory of the OCaml Ecosystem on OCaml.org Ecosystem user-feedback , ocamlorg	14	1063	June 2, 2023

OCaml LLVM backend?

Related topics