Is a mutable project structure inherently slower?

CharlesAverill · August 9, 2023, 11:33pm

I’m writing an emulator in OCaml. To emulate the CPU registers, I’ve defined a register file struct:

type registers = {
  af : u16 ref;
  bc : u16 ref;
  de : u16 ref;
  hl : u16 ref;
  sp : u16 ref;
  pc : u16 ref;
}

According to my profiler, register reads and writes are some of the most frequently called functions in my emulator, e.g.

let set_low r (v' : [> `R16 of int | `R8 of char ]) =
  let v = match v' with `R8 c -> u16_of_u8 c | `R16 i -> i in
  r := !r land 0xFF00 lor (v land 0xFF)

let set_f regs f = set_low regs.af f

My emulator’s performance is roughly 50% slower than I would’ve expected the minimum to be. The only looping action in the entire codebase is the decode/execute loop and a general timing loop surrounding that. My question is: if I restructured the project to make better use of immutability and recursion, would the compiler regain any amount of reasoning to make better optimizations? Has structuring my emulator around mutable data significantly weakened the language’s reasoning abilities?

dbuenzli · August 10, 2023, 12:01am

If you make registers immutable you will put a lot of pressure on the gc, I doubt it will be more efficient.

However you current definition of registers is likely sub optimal. Basically you have an immutable record of references. So that entails two memory accesses to access a register. You should try to use a record with mutable fields.

type u16 = int
type registers = 
{ mutable af : u16;
  mutable bc : u16;
  mutable de : u16;
  mutable hl : u16;
  mutable sp : u16;
  mutable pc : u16; }

let set_f regs v = 
   let v = match v with `R8 c -> u16_of_u8 c | `R16 i -> i in
   regs.af <- regs.af land 0xFF00 lor (v land 0xFF)

CharlesAverill · August 10, 2023, 12:05am

I appreciate the advice @dbuenzli. As you were typing that I tried the same switch to mutable struct fields but sadly found no performance increase.

An unrelated idea, but because the a and f registers are accessed much more than the others, I’m considering splitting them into individual fields to avoid the masking logic that I’m utilizing right now.

nojb · August 10, 2023, 5:40am

I think it is hard to answer your question without knowing the general shape of your code, how you are timing it, etc. In terms of the data structure, you won’t get more efficient than what @dbuenzli suggested, but as mentioned, there may be many other factors that come into play.

Also, what does “50% slower than I would’ve exepected the minimum to be” mean?

Cheers,
Nicolas

vlaviron · August 10, 2023, 11:24am

Most likely, no. Immutability is slightly better for the compiler (particularly flambda), but recursion remains complicated to optimise around. Flambda 2 will have more capabilities around that, but it’s not released yet.

The OCaml compiler can do a bit of optimisation around mutable structures. However, these optimisations mostly stop at function calls, so if you want to take full advantage of them you’ll need to be careful not to send mutable structures through functions (either as parameters or implicitly).
Flambda can mitigate that to some extent by optimising during and after inlining.

bluddy · August 10, 2023, 11:49am

You really need to profile the code to figure out where the issue is.

Emulators are very demanding, so a seemingly small compilation issue can cause big slowdowns.

The register file may be more efficient to implement with Bigarrays, since that takes care of the write barrier when writing to them.

amongonz · August 10, 2023, 4:14pm

Have you profiled where in your codebase is it spending more time than you expected, and over how long of a trace? We can’t say much without hard data, but aside from mutable, I see you’re using [> `R16 of int | `R8 of char ], so you’re probably allocating a 24 byte block (assuming 64-bit target) on each register write, unless you’re passing just literals.

Does your hot loop have any other frequent allocations? If so, you might be triggering the minor GC too often and losing throughput.

dbuenzli · August 10, 2023, 4:19pm

Btw. there are a few gameboy emulators written in OCaml you may want to check out what they do:

Muqiu-Han · August 15, 2023, 5:13am

Is it possible to use array to store registers, use index access, only number each register at the code level.

Topic		Replies	Views
Why is it five times slower to use mutable fields in records? Learning performance	4	796	December 30, 2020
Type for mutable sets Learning	4	704	May 31, 2022
Register allocation super-linear timing Ecosystem	4	1319	January 19, 2018
Immutable Union-Find Structure Learning datastructures	4	2207	November 7, 2018
Blog post: Writing a Game Boy emulator in OCaml Learning blog	12	3021	January 21, 2022

Is a mutable project structure inherently slower?

Related topics