skral
August 20, 2020, 1:27pm
1
For x86-64, ocamlopt
emits code like leaq 1(%rax,%rax),%rax
for tagging integers.
However, for all Intel Core processors since 2011, leaq 1(,%rax,2),%rax
would be better.
Comparison:
PRO: shorter latency (1 cycle vs. 3 cycles)
PRO: higher throughput (2 per cycle vs. 1 per cycle)
CON: larger encoding (8 bytes vs. 5 bytes)
For AMD “Ryzen” processors, the above is not an optimization (same latency and throughput).
Please consider adapting the code generator for Intel64 in ocamlopt
. Thank you!
PS. For details on the timing, see https://www.agner.org/optimize/
nojb
August 20, 2020, 1:44pm
2
For requests involving the compiler, please open an issue at https://github.com/ocaml/ocaml/issues
Regarding the request itself, as a general rule the compiler team has historically avoided optimizations that depend on specific processor models, as it adds testing burden (more code paths to check), complexity (need to check for specific models in the code generator), etc.
Cheers,
Nicolás
skral
August 20, 2020, 2:09pm
3
Will do. And I will try to address the issues you mentioned. Very helpful!
Thank you for your reply.
I’d prefer it if participating in the development didn’t require a Github account. I’m sure the issues are known to most here.