Tutorial on writing an OCaml bytecode interpreter from scratch in Rust?

Has anyone here written an OCaml bytecode interpreter from scratch ?

Only the bytecode interpreter part. Not the parser / typechecker / compiler. No optimizations. Just a dumb bytecode interpreter.

I feel like many questions I have regarding (1) transpiling, (2) OCaml runtime, (3) OCaml ↔ other languages would be auto resolved if I had a correct mental model of the OCaml bytecode interpreter.

Has anyone written one from scratch? Any advice / insights ?

Thanks!

PS: I’m okay with using a dumb mark & sweep, global-stop-the-world GC. The main goal is educational, not production use.

1 Like

Have you read Xavier’s Master’s Thesis: https://hal.inria.fr/inria-00070049/document

?

2 Likes

I have not, thanks for the reference !

I’m also curious about that, because the actual OCaml interpreter has an
annoying limitation: it’s not reentrant, which means you can’t really use it as
an embedded interpreter (and to use the OCaml toplevel you also need
your program to itself be compiled to bytecode).

An alternative, no-global-state, simple to embed interpreter could be
useful for programs extensible with/scriptable with OCaml, imho.

3 Likes

OCaml’s bytecode is a simple 1-register stack machine, so most of its instructions are trivial to implement. But here are a few things to look for:

  • There are macro-instructions. For example, PUSHACC is actually the sequence of the instructions PUSH and ACC, i.e., it pushes the accumulator register on the stack and then loads it from a random position on the stack.
  • The caller of a function might push more arguments on the stack than what the called function expects. The interpreter needs to keep track of their number. Indeed, if there is any extra argument, the RETURN instruction does not give control back to the caller. Instead, it assumes that the current accumulator is a closure and gives it immediate control.
  • The GRAB instruction checks that the caller passed enough arguments onto the stack. If not, it creates a closure, stores all the available arguments in it, points the closure at the instruction just before GRAB (which is assumed to be a RESTART instruction). Then, it returns to the caller, as if executing a RETURN instruction.
3 Likes

In camlboot there’s is an OCaml interpreter. It is not an OCaml bytecode interpreter but it may help depending on what you’re trying to achieve.

Quoting the bottom of the README

Expect this to take some time: on an 8-core machine, it took about 16 hours of CPU time, and 4 hours of wall-clock time.

Is this a typo? WTF is going on? 1. How is this taking 4 hrs on 8 cores; and who has the patience to debug something that takes 4 hrs on 8 cores ?

It’s the time needed to bootstrap the full OCaml compiler, see the paper.

1 Like

I just found this topic.

Some years ago I wrote a basic bytecode interpreter in Rust. My interpreter is very incomplete. I just made it run an example program I had.

I took as reference the interpreter found in obytelib. For garbage collection, I wrote a simple mark & sweep. Another important part was a deserializer of ocaml values. The bytecode includes serialized constant values.

My objective was to have a small interpreter that could be embedded and having multiple instances of it, just like Lua.

As I mentioned, I did not finish the project, but if you are still interested, I could share the code.

1 Like