Your response makes sense. However, adding x86-64 and aarch64 simultaneously may ensure that the specific implementation strategy (and performance numbers you get) are not the artifact of special techniques or tricks available only on the intel platform but generalize on all mainstream hardware platforms.
From my limited understanding, there are a lot of tricks you are using to make sure multicore GC and continuations are performant for legacy and future code. I had a quick read of the retrofitting effect handlers paper sometime ago and can only imagine that there are a lot of considerations at play here that go down to the deep machine architecture level.
It’s possible that the multicore team has already throught this through and has come to the conclusion that implementation details while important are not super peculiar to x86-64 and its just a matter of time and effort to work this out for aarch64…