From what I can gather, the primary challenges are calling convention + garbage collection, but I do not have sufficient background to concretely point out what are the exact challenges.
GHC now uses LLVM, and there were a bunch of additions to the LLVM IR to make that possible. The LLVM people seem to be very willing to make IR alterations to make it easier for modern languages to adopt LLVM as a back end.
It certainly has a bunch of advantages for any language front end using them. Among other things, it means you don’t have to worry about your own back end code generator details, and you get to take advantage of loads of optimizations the LLVM people have in their infrastructure for free.
One disadvantage is that LLVM makes quite regular changes and you have to keep up if you want your front end to work with newer versions of LLVM.
It is worth noting that LLVM is not really a good back-end for languages with garbage collection or really functional languages in general. It doesn’t provide support for doing a proper tracing GC, it’s slowly getting better but the last I checked it still wasn’t good enough.
GHC gets around this by using a shadow stack, which slows things down and somewhat defeats the point of using LLVM’s backend since a lot of the optimisations won’t be able to apply. Of course GHC was already using a shadow stack for other reasons so LLVM was probably still an improvement for them.
So personally I would expect an LLVM back-end to be worse than the current back-end for most OCaml code. It might improve low-level numeric code if written in the right style, but for most things I would expect any gain from the better instruction selection to be offset by the cost of spilling everything all the time.
I’m not sure that’s true as such. That is to say, yes, you have to generate code to track roots and for the GC operations yourself, but I don’t see that as much of a problem. After all, your code generator does know already which things are pointers (and indeed, LLVM’s low level representation is typed.)
What would you want the back end to do to make garbage collection simpler, in your opinion? The LLVM people are very open to additions to their system.
I think it also helps with all sorts of optimizations. LLVM has some very nice optimizer passes, and writing new ones if there are any that would help OCaml in particular would be straightforward. It should even be possible to write some of those in OCaml. Another big win is not having to maintain back ends for all sorts of operating systems and architectures.
(BTW, it’s very weird to be in a discussion on this site where I actually kind of know something. I’m very new at OCaml but I know compilers. )
The problem is that you cannot have GC roots in registers across points where the GC might run. This is not really acceptable for a proper GC. The additional spilling will severely harm performance.
Hmm. I’ve had a look at the most recent work on GC support, and it might now be expressive enough. As I said, it has been slowly getting better. It still looks like it’s going to unnecessarily get in the way of a lot of LLVM’s optimisations. Given previous attempts to use LLVM for GC I would enter this area with caution.
Personally, I also feel that LLVM is still over-fitted for C/C++ compilation (well Clang specifically). And that over-fitting is a particular problem because the whole idea of LLVM – a single IR shared by all consumers, producers and optimisations – makes it hard to extend without needing to change all of those consumers, producers and optimisations (or worse neglect to change optimisations and have subtle miscompilations).
I’d rather see effort spent on improving OCaml’s current back-end. I strongly susepct that a more modern register allocator and some work on instruction selection would have a better ratio of development effort to gains in performance.
Well, one of the nice things about LLVM is it is purely a back end. That is to say, it’s a pretty low level representation. And one of the nice things about the OCaml compiler is that it’s retargetable. So one can experiment with this without terribly much commitment, and even maintain both native and LLVM targets.
Why is it worse than in code you generate for yourself? Just the lack of direct access to the lower level post-register allocated representation for open coding barriers? The problem of having to find roots inside registers if a GC is called from an alloc operation happens regardless of whether you’re generating your own code or not of course. I think it is probably possible to fix most of the painful bits with some hooks added to LLVM.
LLVM is highly Clang oriented, but they don’t want it to be long run, and I think there are enough other customers at this point that it should be feasible to add the tweaks a modern high level functional language would need for efficient GC. It’s a matter of thinking it through.
Why is it worse than in code you generate for yourself? Just the lack of direct access to the lower level post-register allocated representation for open coding barriers?
Yeah that’s the problem, you need to pass the register information to the runtime. Although to be fair it was a long time since I’d last looked at this, and I had somewhat confused two separate issues in my mind. For a long time it was the case that the only way to use a GC with LLVM was using a shadow stack, and that suffers from large costs for “spilling” things to and from the shadow stack. Then they improved it so you could leave things on the proper call stack, but you still couldn’t create stack maps that included which registers needed to be rewritten by the GC, which is needed for how OCaml’s runtime works, but it is less costly to work around than the old shadow stack approach.
but they don’t want it to be long run
And maybe in the long run it would be a good idea to use it for OCaml, but until then I wouldn’t want to see OCaml’s very limited man power spent on trying to get LLVM to that point. Of course, if someone just really wants to spend their time doing that then that is their choice, I just think people should be aware that LLVM is not a magic drop in replacement for backends of functional lanugages – it needs real work to get it to maturity on that front.
As I hinted at before, I’m also not personally a fan of LLVMs whole approach. I think that using a single fixed IR is the wrong approach in the long run due to the constraints it places on extensibility and the difficulties it inevitably has for compatibility. Really optimisations should be closer to functors that can work on any IR that provides the required operations (and one day, with any luck, the proofs of the properties it depends on).
There is also always a cost to the generality of any kind of shared backend. Many optimisations are more efficient and easier to implement with higher-level information. There have certainly been projects, for example the Webkit backend, which ended up switching away from LLVM because, with language-specific information, they could get the same performance with much simpler and more efficient implementations.
So I spent some time this afternoon looking at it, and it appears that LLVM now supports IR primitives for most of what you want, including marking gc roots (including on the stack and in registers), creating stack maps, etc. The support looks pretty sophisticated, and is in use in a commercial native code Java compiler and several front ends that use garbage collection. It is still required that one provide one’s own garbage collector, however.
I think this isn’t perfect yet but it looks pretty sophisticated even to the point of providing quite reasonable support for parallel collection via barriers and for generational GCs.
Well, on the extensibility front, the LLVM people are big on changing the IR in response to feedback from front end users (they made substantial changes for the GHC people, for example). I agree that LLVM is very far from ideal, but it is none the less catching on a lot for a good reason — it makes the lives of its users significantly easier. Building back ends for every processor in sight, and every object file format, and dealing with all the world’s debugging information formats, gets to be quite a drag. It’s nice to have someone else do it for you.
THAT SAID: I am not suggesting that OCaml should or must switch to using it. Among other things, I’m not prepared to do the work to try out if it works well, and in open source projects one depends on people wanting to do some work for it to occur. One can’t order other people to find something interesting. It is also not clear to me that it would be a win, just that it might be, and it’s a bunch of work to do for an experiment.
However, it is clear to me that it would be an interesting experiment for someone with an itch to do it to try.
For that, I think the trick would be to do those optimizations before lowering to LLVM.
BTW, the LLVM link time optimization stuff might provide some interesting opportunities for things like handling parametric polymorphism with polyinstantiation followed by optimization of the type specific instantiations to take advantage of type specific optimization opportunities, but I know little of the innards of the OCaml compiler and it might already do a sophisticated job of handling that.
Flambda does that with inlining, function specialization etc. It is an optimization pass that is not shipped with the default ocaml compiler. Unlike C/C++ compilation, when compiling a .ml file, the compiler can be given visibility over the IR (in the form of .cmx files) of modules that it depend on – hence various cross-module optimisations can be done at this stage of the compilation.
On a higher level, I would be doubtful whether LLVM’s LTO would make a big difference without substantial amount of work. Flambda does a pretty decent job with type specific optimisation, especially in the case of floating point arguments, where it substantially reduces the number of allocations. That being said, there are optimisations beyond inlining + allocations in LLVM’s LTO, but I am not sure how would that compare against what is being done during linking in ocaml.
I didn’t know that. This implies there isn’t really separate compilation then in the sense that one must have the IR for a module one depends on around at compile time, and that if I re-compile a module that something depends on the dependent modules must be recompiled? (I’m a newbie and trying to understand the model.)
Oh, there’s no question about that! It’s not built for this as things stand. But it could be extended to do it. Then again, such extensions would need to be written in C++, which isn’t exactly my idea of a fun time.
Bytecode - recompilation is not needed unless you change the mli file
Native code - recompilation needed if you expose .cmx files in include paths, which is usually the case. It is possible to not expose .cmx files by not including them in the include path while invoking ocamlopt. Doing this, however, would miss out on a lot of optimization opportunities.
@fyquah95 Thank you! That was a helpful piece of information. I think now that I’ve been using the language for a while I should re-read RWO, there’s a bunch of stuff I didn’t pay close attention to the first time through.