Is a mutable project structure inherently slower?

I’m writing an emulator in OCaml. To emulate the CPU registers, I’ve defined a register file struct:

type registers = {
  af : u16 ref;
  bc : u16 ref;
  de : u16 ref;
  hl : u16 ref;
  sp : u16 ref;
  pc : u16 ref;

According to my profiler, register reads and writes are some of the most frequently called functions in my emulator, e.g.

let set_low r (v' : [> `R16 of int | `R8 of char ]) =
  let v = match v' with `R8 c -> u16_of_u8 c | `R16 i -> i in
  r := !r land 0xFF00 lor (v land 0xFF)

let set_f regs f = set_low f

My emulator’s performance is roughly 50% slower than I would’ve expected the minimum to be. The only looping action in the entire codebase is the decode/execute loop and a general timing loop surrounding that. My question is: if I restructured the project to make better use of immutability and recursion, would the compiler regain any amount of reasoning to make better optimizations? Has structuring my emulator around mutable data significantly weakened the language’s reasoning abilities?

If you make registers immutable you will put a lot of pressure on the gc, I doubt it will be more efficient.

However you current definition of registers is likely sub optimal. Basically you have an immutable record of references. So that entails two memory accesses to access a register. You should try to use a record with mutable fields.

type u16 = int
type registers = 
{ mutable af : u16;
  mutable bc : u16;
  mutable de : u16;
  mutable hl : u16;
  mutable sp : u16;
  mutable pc : u16; }

let set_f regs v = 
   let v = match v with `R8 c -> u16_of_u8 c | `R16 i -> i in <- land 0xFF00 lor (v land 0xFF)
1 Like

I appreciate the advice @dbuenzli. As you were typing that I tried the same switch to mutable struct fields but sadly found no performance increase.

An unrelated idea, but because the a and f registers are accessed much more than the others, I’m considering splitting them into individual fields to avoid the masking logic that I’m utilizing right now.

I think it is hard to answer your question without knowing the general shape of your code, how you are timing it, etc. In terms of the data structure, you won’t get more efficient than what @dbuenzli suggested, but as mentioned, there may be many other factors that come into play.

Also, what does “50% slower than I would’ve exepected the minimum to be” mean?


I assume you are compiling the program with the native-code compiler (ocamlopt).

Given that you are writing an emulator, I assume you can read assembly language(s). You might find useful for understanding how OCaml compiles code.

Aside from using mutable fields instead of refs, you might also want to avoid using (polymorphic) variants to specify the values. It is likely that they will be heap allocated.

I assume you are emulating a 16-bit machine? If you want to be able to treat 8 and 16-bit values generically, you could encode the information in a single OCaml int so that no allocations (or write barriers) are needed. Something like this:

module R : sig
  type t [@@immediate]
  val make_8 : int -> t
  val make_16 : int -> t
  val is_16 : int -> bool
  val to_int : t -> int
  (* ... *)
end = struct
  type t = int
  let make_8 x = (x land 0xff) lsl 1
  let make_16 x = ((x land 0xffff) lsl 1) lor 1
  let is_16 x = (x land 0x1) <> 0
  let to_int x = x lsr 1
  (* ... *)

Most likely, no. Immutability is slightly better for the compiler (particularly flambda), but recursion remains complicated to optimise around. Flambda 2 will have more capabilities around that, but it’s not released yet.

The OCaml compiler can do a bit of optimisation around mutable structures. However, these optimisations mostly stop at function calls, so if you want to take full advantage of them you’ll need to be careful not to send mutable structures through functions (either as parameters or implicitly).
Flambda can mitigate that to some extent by optimising during and after inlining.

You really need to profile the code to figure out where the issue is.

Emulators are very demanding, so a seemingly small compilation issue can cause big slowdowns.

The register file may be more efficient to implement with Bigarrays, since that takes care of the write barrier when writing to them.


Have you profiled where in your codebase is it spending more time than you expected, and over how long of a trace? We can’t say much without hard data, but aside from mutable, I see you’re using [> `R16 of int | `R8 of char ], so you’re probably allocating a 24 byte block (assuming 64-bit target) on each register write, unless you’re passing just literals.

Does your hot loop have any other frequent allocations? If so, you might be triggering the minor GC too often and losing throughput.

1 Like

Btw. there are a few gameboy emulators written in OCaml you may want to check out what they do:

  1. Writing a Game Boy Emulator in OCaml - The Linoscope Machine
  2. GitHub - Engil/Goodboy: A pure OCaml Gameboy emulator

Is it possible to use array to store registers, use index access, only number each register at the code level.