The Rust community owes the OCaml community a debt of gratitude, the first Rust compiler was written in OCaml. In this post we rewrite the OCaml runtime in Rust. Consider it our down payment on repaying the debt. ![]()
No Marks Shinwell were harmed[1]
More seriously
The OCaml runtime is written in C. I present a port of the C to Rust. It works. It passes the OCaml compiler’s test suite. The test suite is upstream’s, unmodified. The OCaml compiler with the Rust runtime can build itself, can build dune, install an opam switch, and build arbitrary OCaml programs in both bytecode and native. There’s no C left in the runtime. The performance is… surprising! The most fun part was what it taught me about OCaml.
The fork is here https://github.com/mbacarella/rustcaml, if you want to check it out. You just clone the repo, be on the rust-runtime branch and run the normal build instructions. Make sure you have Rust cargo installed. There’s also a rust-runtime-nightly branch that runs the interpreter faster. More on these below.
Consider the rest of this post more as an experience report and proto case-study of a human steered AI driven rewrite of a mature, sophisticated codebase: the OCaml runtime.
The way this was done was not by saying “yo Claude, spin up an agent swarm or whatever, you’re going to greenfield design and implement an OCaml runtime from scratch in Rust. Make no mistakes”. Rather, the instructions were more like “lets file-by-file, line-by-line translate the C code to equivalent Rust code and check with me every step of the way”.
Despite fairly low-tech orchestration, normal person token budget and Claude’s protests, it worked!
Wait, isn’t OCaml written in OCaml?
Quick detour for readers wondering why OCaml isn’t written in OCaml. Most of it is! The compiler, the standard library and the tooling are. But the runtime is written in C. The source tree ships with boot/ocamlc, a bytecode build of the compiler from a previous version. To build OCaml from source, you first build a C program called ocamlrun that knows how to execute OCaml bytecode. Then you use ocamlrun + boot/ocamlc to compile the current sources into native ocamlc. Why not write the runtime in OCaml too? You could, but one reason not to is then you’d have to ship prebuilt binaries for every architecture and OS you plan to support, forever. There are other good reasons too, which will be explored below.
A word on methodology
I used Claude Code for this with the Opus 4.7[2] model on an x86-64 Linux box. Claude did most of the tactical work: coding and presenting a menu of choices on where to go next, and running the tests. I did the steering and some pep talking (more discussed below).
The Bun project recently reported porting 960k(!) lines of Zig to Rust, which generated a fair bit of, let’s say… excitement. I’ll confess my eyes glaze over the moment I see “JavaScript” so I come to this kind of late. But the comparison is instructive because of the differences: Bun runs on JavaScriptCore, so the garbage collector and the interpreter are C++ and were never in scope for the Rust port. Gnarly as that rewrite was, it’s kind of machinery wrapped around a pre-existing engine. It side-steps porting the really interesting part, IMO.
So, that makes this project different. The OCaml runtime is an expertly engine written C: sophisticated multicore machinery, a real garbage collector, FFI contracts that have to be honored to the byte. None of this stuff is part of the normal programmer experience. How would an LLM do at porting this? I wanted to know!
The plan was simple but I think worth spelling out. Make sure the compiler is never not able to run. Put a per-file toggle in the build: flip it and the linker selects the Rust version of a file instead of the C version. This let us port one file at a time, run the full test suite after each flip, and commit a known-good state before moving on.
Each .c file was effectively line-by-line translated to a .rs file as closely as could be done, as that would make it easiest to track and incorporate upstream changes in OCaml. The port can grow with older brother OCaml.
The purpose of the git commit after each port was so that if we get really stuck in the near or far future, like we discover an application that trips an extremely obscure bug, we’ll be able to bisect to any point in time to figure out where in the porting effort the divergence was introduced. The OCaml runtime was never not in a state where it couldn’t run, whether it was 2%, 50% or 99% ported to Rust.
We ended up with two branches. A rust-runtime branch which was kept to Rust stable, and then rust-runtime-nightly, which added Rust nightly features to experiment with some performance.
Benchmarks
Okay, I made you wait long enough. You’re all curious. My prediction going into this was that the Rust-based native executables would be about 10-20% slower and the byte code interpreter would run 20-30% slower.
This was my expectation given how I had constrained the port, no use of idiomatic Rust, match the C line-by-line as closely as possible, stick to Rust stable features, and don’t use the computed gotos.
The headline results are “eh, just about parity, kinda”
| Runtime | Bytecode (cycles vs C) | Native (cycles vs C) |
|---|---|---|
| C (trunk) | 1.00x: (baseline) | 1.00x (baseline) |
| Rust (stable) | 1.44x, slower than C | ~1.05x (range: 0.87-1.13) |
| Rust (nightly) (ETCs) | 0.91x, faster than C | same as Rust stable |
According to a selection of sandmark tests, the native executables are at about parity. The byte code interpreter, on the other hand, runs about 2x slower in Rust stable because of missing computed gotos. If we switch to Rust nightly and add explicit tail calls (ETCs), either equals or even slightly beats the C runtime.
I was expecting the non-idiomatic translations to be slower across the board since you’re fighting the language instead of letting it shine. But it turns out C-like-Rust, albeit with some warts, can perform about as well as C.
Full table of benchmarks are at the end.
2015 unsafes
ocaml/runtime % rg unsafe *.rs | wc -l
2015
Let’s also get this out of the way: this is not safer for being in Rust. This cut is probably slightly unsafer than the C, if anything. But the ~2015 unsafes aren’t a translation failure. Rust simply makes the unsafe that was already there more legible.
Three reasons the number can’t be much lower, most irreducible first:
- Type erasure. After compilation, every value is one machine word tagged int or heap pointer, and the runtime spends its whole life reading and writing untyped words. There are no static types at that level for Rust to check; every field access is ~a raw pointer deref. The borrow checker can’t work with that.
- The runtime is the GC. It mutates and moves live pointers during collection. This is precisely the set of things the borrow checker exists to forbid. Any Rust port of language runtimes will struggle with this, human or machine written!
- It must speak the C ABI because that’s the OCaml runtime ABI. Further, Rust has no stable FFI of its own, so extern “C” is the only way to interoperate between languages. This is just unavoidably unsafe.
A fourth reason is on me: I boxed (ha!) Claude into translating the C line-by-line instead of idiomatically, to track upstream, so some of these preserve the C correspondence and could be encapsulated later if the project ever became unconstrained.
A lot of unsafe can’t be removed without breaking OCaml as it is known. In this sense, the introduction of unsafe is simply revealing how much of what goes on inside of the runtime is carefully managed unsafety.
That all said, this could be a good base to increase safety overall if the project diverges from build and binary compatibility with OCaml.
More musings
When I first asked Claude to take inventory and propose porting this project it thought it was fairly crazy and unlikely to succeed. At some point it predicted subtle errors would creep in and become quasi-irreducible and it would take 2-3 years to finish, if we didn’t get bored of it by then.
I had to kind of give it a pep talk and go over ways we could tread lightly and keep our confidence high. It reminded me of working with human developers, in a way.
Not that it wasn’t correctly calibrated! The OCaml runtime does push C to its limits! It uses a lot of very clever macros to make the developer experience manageable. The multicore system makes strong guarantees. The foreign function interface (FFI) depends on C ABI compatibility, and that must be maintained forever as Rust has no standard ABI of its own. Once in awhile it would port a file, fail to make test progress, get discouraged and revert it and suggest we try something else for now.
Here in meat space, the whole project took about 7 days. Most of it was spent waiting on me to approve commands while I went about my normal life (you can direct Claude Code from the mobile app but it’s still pretty flakey and hangs a lot). Unlike the experiment with the Claude C Compiler, I was not using “Claude Teams”, or any other kind of multi-agent orchestration. I only had one agent going. A lot of time was also spent waiting for test suites to re-run every time we ported one of the 71 runtime/*.c files to .rs. Iterating and isolating and fixing failures that did crop up took many hours at a time.
Regarding workflow, I did very little gatekeeping of the code. In my experience, line-by-line translations are something LLMs excel at. Also I wouldn’t have been able to review ~40,000 lines of code. Rather, I spent more time at the file unit level and decided which files to port in what order and ensuring these were being exercised in tests. Mostly I kept imagining different ways of how horrible it would be if we passed every test we could think of and then ran some obscure application and got a weird heap corruption bug and had no idea where to start. Keeping that in mind reined the LLM in a bit.
Claude’s carefulness discipline was good but not great
The compiler testsuite and sandmark were our guiding light and Claude never once attempted to hack them to make progress. Claude did get confused once in awhile though. Early on, a test triggered a segfault and Claude kept going, continuing to port more files over. When I pushed back and asked why it didn’t investigate the segfault, it said it assumed that the segfault was always there, even in the trunk version, and never thought to check.
This happened a few times, actually. In long running sessions, especially if context has been compacted, it can lose track of what the gold condition of the tests is and convince itself that the current test failures were not introduced by its changes. I would have to remind it to run the testsuite on trunk and see that they all come back green to really get the point home.
I know I said LLMs excel at line-by-line translations, and they do. They’re better than humans are at it, though they’re not flawless. Aside from bulldozing past the segfault that one time, it also made mistakes like not reproducing big enums perfectly, like if they started at a different index from 0. Once in awhile it gets magic numbers backwards in initializers (e.g. { a=2, b=1 } instead of { a=1, b=2 }. If there’s a lot of stuff to manually transcribe it was smart enough to write some sed/awk scripts to translate them, but not always. Generally these mistakes showed up quickly in tests, but took a lot of time for it to iterate on to spot and fix.
Claude frequently doubted itself but with simple prodding it would do the work and succeed spectacularly. Other times it grouped things up not realizing they could still be independent. E.g. it decided we needed to port the entire GC at once, which spanned several large files, even though we’ve been following a one-by-one porting strategy the entire time, and was getting ready to try all four at once and really knuckle down trying to debug 5000 lines of very intensive Rust. But when reminded of the fact that we’d been following a by-file strategy the whole time and asking why it couldn’t work here, it admitted it could, and then did it.
Overall, Claude had fairly good instincts. It felt like pair programming with someone who’d internalized the OCaml runtime, C and Rust all at once. But also like pair programming with someone that had ADHD.
Tests were done
Tests run
- OCaml compiler’s test suite
- The fixpoint test: having the compiler build itself and then using that to build itself again and confirming the two generations of compiler code are identical
- Sandmark benches, both A/B test and performance
- TSan tests
- Build an opam switch with the compiler (basically, tell it the system compiler is at
~/code/ocaml) - Build dune and run its test suite
opam installsome community packages- iter, gen, qcheck-core
- fmt, logs, astring, rresult, jsonm, uucp, uuseg
- digestif, decompress, optint, ke, hxd, checkseum
- mpg123, curses
As a pedagogical tool
One thing I underestimated going in is the educational value of having an LLM translate code, especially this code, while you watch. Reading the C directly gets you a partial picture, you see what each function does and why it’s structured that way. Asking the model to translate it to another language seems mechanical enough, but the places where it stumbles or prompts a choice reveals things you might be missing.
Halfway through we were translating lf_skiplist.c, and I realized I’d never used a skip-list before in my life[3]. Reviewing it, I wondered why this wasn’t a tree? It would be a lot easier for Rust to represent it, that’s for certain. But then I realized the difference is the concurrent, lock-free part is actually doable with a skip-list in a way that would be extremely hard if it was a tree. This is a really elegant decision.
As an aside: if you’ve ever thumbed through the Algorithm Design Manual, perhaps fallen asleep at night with it on your chest, and lamented that, despite there being like 75+ algorithms catalogued, you only use the top 5, maybe consider a career in compilers.
Tying up the lock-free skip-lists part. There’s a bit of a meme about how hard it is to do linked lists in Rust because it maps very poorly to its lifetime model. The Rust port side-steps the borrow-checker completely here by preserving the raw pointer manipulations. It’s actually merely as hard as the C version if you useunsafe ![]()
A second, more interesting problem was the bytecode interpreter. It leverages computed gotos for a big performance win. Computed gotos are a non-standard extension to C compilers (supported in GCC and LLVM). I knew these existed but the extension is so niche I had also never used one in my career.
The reason computed gotos are fast for interpreter loops is not because they optimize jumping to handlers, but because they make the jump predictable to the branch predictor inside of the CPU. The Python interpreter had a rather big performance regression recently because they stepped on a toolchain footgun that altered the behavior of this heuristic. But, computed gotos are not representable in Rust stable. More on the workaround in the “rust-runtime-nightly” section below.
A last, more minor surprise to me, and a bit of an embarassment since I’ve been in OCaml land for so long, is that I had a mistaken impression of how runtime values are encoded. I knew OCaml’s runtime steals a bit from each value to flag whether it’s an immediate (an int63) or a heap pointer. But I had it backwards, I thought it stole the top bit and simply halved the address space it could address since on 32-bit platforms only 2GB is effectively usable for a heap anyway (the kernel reserves half), and on 64-bit platforms you won’t really suffer if you’re restricted to only 9 quintillion bytes. But it’s the opposite: it steals the bottom bit. If the bit is set to 0 it just knows it’s a pointer because all heap pointers are 4 or 8 byte aligned. Those have the bottom bit set to zero naturally. If the bottom bit is set to 1, it knows the top 63-bits are an integer, and it simply shifts it downwards. In the space of programming language GCs, this is a nuanced trade-off and each does it a little differently. In OCaml it means pointers require no decoding to use and immediates only take a shift to decode. In some cases math operations can be done to the encoded values directly, no shift required.
It’s worth noting that two of these, the computed gotos and the value encoding, are major reasons why the OCaml runtime can’t be optimally written in OCaml itself. The runtime has to live below the abstractions the language provides and that’s one reason to choose something lower level, like C. And why translating it into more idiomatic Rust is not trivial.
Inline assembly, for compatibility!(?)
Three features used by the C based runtime were only available in Rust nightly. Rust nightly can change on you at any time and shouldn’t be relied on, so you’re encouraged to stick to Rust stable if you want timeless software. Given the constraint, the LLM cheerfully offered to implement them in inline assembly to stay on Rust stable ![]()
- Thread local storage: normally available in C compiler extensions. A form of this exists in the Rust standard library, but because we’re using
no_stdwe had to either go with Rust nightly or roll or our. - Variadic functions: A basic feature in C. Rust can call into variadic functions but can’t define them without an extension from Rust nightly. The inline assembly version of this is not too horrible.
- You cannot initialize a Rust static from the address of another Rust static, but this is needed to implement ephemerons in OCaml. C handles this with link-time address relocations. Rust const eval refuses. Inlined assembly solution: just mark some things
.weakin an inline asm block and you’re done.
One feature in C has no Rust equivalent. setjmp/longjmp, or non-local gotos. The interpreter needs these for exception handling. Rust cannot safely call sigsetjmp and be called into by sigsetlongjmp. But you can make an inline assembly version! There’s a lot of concerns on doing this generally in Rust because of returns_twice, but Claude audited the two call sites that use it and decided they don’t rely on persisting local variables that the returns_twice presents clobbering.
Introducing inline assembly in this project isn’t too foreign, as the OCaml native compiler itself emits hand-tuned assembly itself.
There’s a rust-runtime-nightly branch, though
The Rust stable version of the C bytecode interpreter with the computed gotos was ported to a straight loop { match ... }. Performance suffered considerably! So, we decided to introduce Rust nightly in a separate branch to see if we could bring in explicit tail calls to resolve the loss of computed goto in the interpreter.
Benchmarking shows it is actually up to 5% faster than the C computed goto version. So, there’s that.
But why?
At the minimum it was a fun experiment, and that’s reason enough for me. But the thing that actually nags at me: the OCaml runtime is in C because of when it was written, not because anyone recently weighed C against the alternatives and picked it. Rust only appeared two decades after OCaml broke ground. That’s legacy, not an active decision. And the interesting thing about cheap mechanical rewrites is that they turn “it’s in C” from a settled fact back into a live question.
I am not claiming the answer is “therefore, rewrite it in Rust”
The mechanical translation is the easy, cheap part and this whole post is evidence of that. The expensive part is everything the rewrite doesn’t settle, and now that the mechanical cost is near zero, those considerations stop being footnotes and become the whole conversation.
What are the actual tradeoffs? What would it take to trust a mechanical rewrite as much as a hand-written one? If a human had done this exact port by hand over two years, would you trust it more? And if so, what exactly is that delta made of, and can it be closed?
Open questions! And more open than they were a year ago.
Where do we go from here?
I’m really excited this worked at all. I had an incredible amount of fun doing it and if this makes for at least an interesting coffee conversation for some people it is worth it alone.
I don’t seriously expect the OCaml team to switch to this port (it does not comply with their new AI policy, for one). But I would be a little surprised if this experience report caused no update for anyone in the world. Human steered AI rewrites are definitely something that should be part of a programmer’s tool kit, especially if there’s a mature test suite and you have an existing system to use as a source of truth. Even without infinite token budget like AI lab insiders might have, you can still get pretty far!
Future steps from here may include: looking for more opportunities to encapsulate the unsafe parts, verify more of the ecosystem (perhaps by back-porting to a more widely adopted OCaml release), and really see how good a job it did of not introducing new runtime bugs. More speculatively, we could try a toy OCaml fork that breaks compatibility to see how much a performance cost is incurred if we eliminate all unsafe.
Appendix A: Full benchmarks (aka “eh, just about parity, kinda”)
Take these benchmarks with a grain of salt.
The pure Rust stable runtime is at parity with native executable performance but has bigger worst case performance with the interpreter because of lack of computed goto.
With Rust nightly, adding the become keyword for explicit tail calls (ETCs) in lieu of computed goto, interpreter performance is at parity if not faster. One should note that the Rust docs say explicit_tail_calls are “currently incomplete and may not work properly”.
One caveat! Rust shows more win on benchmarks that hammer the GC slow paths, and we wondered if that was just build flags rather than “Rust, the language”. The Rust runtime is one crate built with -O3 and link-time optimization (LTO), while upstream C is -O2 and separate translation units (TUs), no LTO. Upstream calls -O3 “somewhat risky”. For the sake of trying to be more apples-to-apples, we rebuilt trunk’s C runtime with -O3 -flto and re-ran everything. Result: flags explain about a third of the lists win, none of the win in finalise, and on bytecode there’s actually a 9% regression; cross-TU inlining bloats the interpreter loop. Upstream’s default build turns out to be the C runtimes best configuration, which makes it a better baseline.
The results below are upstream’s default flags vs Rust with LTO and -O3.
Again, take these benchmarks with a grain of salt. IMO the takeaway from this is “eh, about parity, kinda”, not “Rust gives a significant boost! We must switch right now!”
Given how posted benchmarks usually go, you should expect the fifth comment down to explain why my benchmarks are completely wrong. Mentally fill it in if it’s missing.
OCaml native and byte-code, no flambda
=== [stock] WALL CLOCK (min ms, lower is better) ===
trunk rust-runtime rust-runtime-nightly
native
almabench 782.61 ms 780.61 ms (0.997x) 788.07 ms (1.007x)
bdd 1553.51 ms 1531.07 ms (0.986x) 1530.86 ms (0.985x)
crout_decomposition 460.74 ms 463.42 ms (1.006x) 473.57 ms (1.028x)
durand_kerner_aberth 569.35 ms 560.39 ms (0.984x) 567.75 ms (0.997x)
kb 1272.94 ms 1269.73 ms (0.997x) 1245.44 ms (0.978x)
soli 2798.39 ms 2850.08 ms (1.018x) 2825.36 ms (1.010x)
binarytrees5 1969.36 ms 1936.12 ms (0.983x) 2044.22 ms (1.038x)
alloc 5916.50 ms 6654.26 ms (1.125x) 6612.67 ms (1.118x)
lists 204.67 ms 177.14 ms (0.865x) 178.34 ms (0.871x)
finalise 351.60 ms 326.26 ms (0.928x) 326.49 ms (0.929x)
TOTAL 15879.67 ms 16549.09 ms (1.042x) 16592.80 ms (1.045x)
bytecode
almabench 5201.01 ms 4754.61 ms (0.914x) 4143.59 ms (0.797x)
bdd 26813.52 ms 36608.96 ms (1.365x) 25355.17 ms (0.946x)
crout_decomposition 6862.29 ms 7705.78 ms (1.123x) 6121.15 ms (0.892x)
durand_kerner_aberth 12174.51 ms 14244.28 ms (1.170x) 12766.48 ms (1.049x)
kb 8106.58 ms 11773.33 ms (1.452x) 8507.30 ms (1.049x)
soli 89943.46 ms 112959.91 ms (1.256x) 74206.08 ms (0.825x)
binarytrees5 8493.53 ms 13385.23 ms (1.576x) 8839.91 ms (1.041x)
alloc 70911.11 ms 128335.46 ms (1.810x) 69083.19 ms (0.974x)
lists 311.60 ms 342.14 ms (1.098x) 310.50 ms (0.996x)
finalise 1448.42 ms 1917.89 ms (1.324x) 1327.77 ms (0.917x)
TOTAL 230266.01 ms 332027.59 ms (1.442x) 210661.13 ms (0.915x)
=== [stock] INSTRUCTIONS (min retired, lower is better) ===
trunk rust-runtime rust-runtime-nightly
native
almabench 11,982,525,444 12,045,767,816 (1.005x) 12,119,109,811 (1.011x)
bdd 23,195,880,693 23,672,243,713 (1.021x) 23,496,447,226 (1.013x)
crout_decomposition 10,540,697,462 10,528,518,080 (0.999x) 10,529,006,913 (0.999x)
durand_kerner_aberth 9,925,393,139 9,931,601,603 (1.001x) 9,933,069,944 (1.001x)
kb 16,180,752,516 16,273,026,843 (1.006x) 16,446,171,810 (1.016x)
soli 80,190,717,599 80,192,319,756 (1.000x) 80,192,251,301 (1.000x)
binarytrees5 36,203,768,664 37,582,691,016 (1.038x) 39,510,055,070 (1.091x)
alloc 130,567,325,652 130,484,776,522 (0.999x) 130,485,472,014 (0.999x)
lists 4,080,393,283 3,694,861,446 (0.906x) 3,694,934,206 (0.906x)
finalise 6,795,543,488 6,445,368,456 (0.948x) 6,339,747,723 (0.933x)
TOTAL 329,662,997,940 330,851,175,251 (1.004x) 332,746,266,018 (1.009x)
bytecode
almabench 65,289,519,768 81,655,602,353 (1.251x) 75,567,878,905 (1.157x)
bdd 322,229,228,258 511,376,407,765 (1.587x) 351,643,764,781 (1.091x)
crout_decomposition 89,704,774,045 126,778,953,728 (1.413x) 109,322,946,879 (1.219x)
durand_kerner_aberth 174,081,597,005 260,836,571,057 (1.498x) 251,548,510,858 (1.445x)
kb 113,945,898,760 179,633,203,899 (1.576x) 136,450,793,255 (1.198x)
soli 1,069,822,905,043 1,640,541,482,411 (1.533x) 1,073,678,761,274 (1.004x)
binarytrees5 130,470,446,317 195,276,263,749 (1.497x) 158,298,506,061 (1.213x)
alloc 1,282,744,491,682 2,064,392,411,405 (1.609x) 1,612,854,371,669 (1.257x)
lists 6,153,104,169 6,883,900,743 (1.119x) 6,582,283,553 (1.070x)
finalise 23,288,098,487 31,858,883,756 (1.368x) 25,418,880,165 (1.091x)
TOTAL 3,277,730,063,534 5,099,233,680,866 (1.556x) 3,801,366,697,400 (1.160x)
=== [stock] CYCLES (min CPU cycles, lower is better) ===
trunk rust-runtime rust-runtime-nightly
native
almabench 4,275,133,794 4,266,969,984 (0.998x) 4,306,909,109 (1.007x)
bdd 8,423,078,413 8,336,325,514 (0.990x) 8,299,324,772 (0.985x)
crout_decomposition 2,436,557,980 2,452,598,643 (1.007x) 2,499,609,328 (1.026x)
durand_kerner_aberth 3,069,143,209 3,026,946,007 (0.986x) 3,061,589,426 (0.998x)
kb 6,910,707,438 6,896,853,517 (0.998x) 6,769,334,136 (0.980x)
soli 14,973,590,917 15,301,024,579 (1.022x) 15,129,903,373 (1.010x)
binarytrees5 10,631,483,033 10,480,155,164 (0.986x) 11,050,267,510 (1.039x)
alloc 31,649,853,386 35,786,537,383 (1.131x) 35,671,107,350 (1.127x)
lists 1,102,518,006 953,712,673 (0.865x) 955,703,816 (0.867x)
finalise 1,878,606,838 1,754,635,089 (0.934x) 1,750,940,930 (0.932x)
TOTAL 85,350,673,014 89,255,758,553 (1.046x) 89,494,689,750 (1.049x)
bytecode
almabench 28,150,819,846 25,755,551,403 (0.915x) 22,276,857,902 (0.791x)
bdd 146,021,329,177 200,336,983,620 (1.372x) 138,615,168,879 (0.949x)
crout_decomposition 37,529,521,884 41,904,393,924 (1.117x) 33,079,620,789 (0.881x)
durand_kerner_aberth 66,297,703,676 77,048,128,983 (1.162x) 68,526,496,042 (1.034x)
kb 44,399,403,914 64,233,052,062 (1.447x) 46,199,410,939 (1.041x)
soli 495,570,241,232 617,323,022,780 (1.246x) 406,057,473,311 (0.819x)
binarytrees5 46,566,265,069 72,906,173,460 (1.566x) 47,896,412,861 (1.029x)
alloc 386,859,994,733 698,738,688,990 (1.806x) 371,099,084,024 (0.959x)
lists 1,687,035,843 1,840,186,059 (1.091x) 1,673,067,897 (0.992x)
finalise 7,894,225,414 10,423,868,878 (1.320x) 7,145,578,643 (0.905x)
TOTAL 1,260,976,540,788 1,810,510,050,159 (1.436x) 1,142,569,171,287 (0.906x)
OCaml native-only with flambda
=== [flambda] WALL CLOCK (min ms, lower is better) ===
trunk rust-runtime rust-runtime-nightly
native
almabench 788.42 ms 778.92 ms (0.988x) 776.47 ms (0.985x)
bdd 1701.32 ms 1713.61 ms (1.007x) 1706.10 ms (1.003x)
crout_decomposition 356.73 ms 361.88 ms (1.014x) 354.48 ms (0.994x)
durand_kerner_aberth 560.64 ms 566.03 ms (1.010x) 562.88 ms (1.004x)
kb 1286.42 ms 1277.64 ms (0.993x) 1254.64 ms (0.975x)
soli 2846.62 ms 2835.48 ms (0.996x) 2826.60 ms (0.993x)
binarytrees5 1974.28 ms 2083.74 ms (1.055x) 1985.55 ms (1.006x)
alloc 6614.50 ms 6621.29 ms (1.001x) 5895.50 ms (0.891x)
lists 201.42 ms 179.15 ms (0.889x) 180.03 ms (0.894x)
finalise 333.90 ms 316.36 ms (0.947x) 312.57 ms (0.936x)
TOTAL 16664.24 ms 16734.09 ms (1.004x) 15854.81 ms (0.951x)
=== [flambda] INSTRUCTIONS (min retired, lower is better) ===
trunk rust-runtime rust-runtime-nightly
native
almabench 12,034,850,839 12,098,421,833 (1.005x) 12,172,354,394 (1.011x)
bdd 27,423,697,883 27,857,846,511 (1.016x) 27,756,041,045 (1.012x)
crout_decomposition 7,427,485,867 7,414,807,272 (0.998x) 7,414,600,792 (0.998x)
durand_kerner_aberth 10,049,245,201 10,055,781,084 (1.001x) 10,056,665,095 (1.001x)
kb 16,031,096,938 16,117,382,051 (1.005x) 16,319,595,413 (1.018x)
soli 80,312,687,759 80,310,809,241 (1.000x) 80,313,394,595 (1.000x)
binarytrees5 35,947,315,834 37,275,057,576 (1.037x) 39,466,919,214 (1.098x)
alloc 130,580,050,276 130,480,782,692 (0.999x) 130,470,459,850 (0.999x)
lists 4,060,601,508 3,674,872,614 (0.905x) 3,675,307,359 (0.905x)
finalise 6,527,081,899 6,188,708,550 (0.948x) 6,066,414,273 (0.929x)
TOTAL 330,394,114,004 331,474,469,424 (1.003x) 333,711,752,030 (1.010x)
=== [flambda] CYCLES (min CPU cycles, lower is better) ===
trunk rust-runtime rust-runtime-nightly
native
almabench 4,310,869,190 4,271,067,276 (0.991x) 4,273,574,788 (0.991x)
bdd 9,216,952,701 9,282,752,592 (1.007x) 9,249,625,080 (1.004x)
crout_decomposition 1,892,517,849 1,921,876,466 (1.016x) 1,889,816,433 (0.999x)
durand_kerner_aberth 3,033,087,434 3,058,715,626 (1.008x) 3,050,420,028 (1.006x)
kb 7,011,708,143 6,956,097,326 (0.992x) 6,830,973,136 (0.974x)
soli 15,289,634,946 15,222,697,901 (0.996x) 15,185,371,100 (0.993x)
binarytrees5 10,696,099,880 11,287,126,205 (1.055x) 10,758,391,992 (1.006x)
alloc 35,697,684,634 35,701,748,838 (1.000x) 31,670,157,017 (0.887x)
lists 1,085,485,730 959,634,997 (0.884x) 970,192,157 (0.894x)
finalise 1,797,386,524 1,694,521,695 (0.943x) 1,683,679,531 (0.937x)
TOTAL 90,031,427,031 90,356,238,922 (1.004x) 85,562,201,262 (0.950x)
Alright, that’s it. Thanks for reading.
Footnotes
- Very online joke re: this comment on the vibe-coded OCaml DWARF PR.
- Most work was done with Claude Opus 4.7. Opus 4.8 was released when we were almost done. The tail-end of the work was done with that.
- Although, thanks to frequency illusion, now that I’ve read about it, I’m starting to notice skiplists in more places. Do see the associated paper!