ADX on llvm-mca: Is its reciprocal throughput 1 or 0.5?

76 Views Asked by At

intel.com: Skylake Throughput (CPI) 0.5

But llvm-mca returns

$ llvm-mca kkk -mcpu=skylake -timeline --timeline-max-iterations=10 --timeline-max-cycles=999

...
Timeline view:
                    0123456789          0123456789          0123456789   
Index     0123456789          0123456789          0123456789          012

[0,0]     DeER .    .    .    .    .    .    .    .    .    .    .    . .   adcxq   %rax, %rbx
[0,1]     D=eER.    .    .    .    .    .    .    .    .    .    .    . .   adoxq   %rcx, %rdx
[0,2]     D==eER    .    .    .    .    .    .    .    .    .    .    . .   adcxq   %rsp, %rbp
[0,3]     D===eER   .    .    .    .    .    .    .    .    .    .    . .   adoxq   %rsi, %rdi
[0,4]     D====eER  .    .    .    .    .    .    .    .    .    .    . .   adcxq   %r8, %r9
[0,5]     D=====eER .    .    .    .    .    .    .    .    .    .    . .   adoxq   %r10, %r11
[1,0]     .D=====eER.    .    .    .    .    .    .    .    .    .    . .   adcxq   %rax, %rbx
[1,1]     .D======eER    .    .    .    .    .    .    .    .    .    . .   adoxq   %rcx, %rdx
[1,2]     .D=======eER   .    .    .    .    .    .    .    .    .    . .   adcxq   %rsp, %rbp
[1,3]     .D========eER  .    .    .    .    .    .    .    .    .    . .   adoxq   %rsi, %rdi
[1,4]     .D=========eER .    .    .    .    .    .    .    .    .    . .   adcxq   %r8, %r9
[1,5]     .D==========eER.    .    .    .    .    .    .    .    .    . .   adoxq   %r10, %r11

which only executes one instruction every cycle. Why?


It should be sure that there's some bug in llvm-mca:

Index     0123456789 

[0,0]     DeER .    .   adcxq   %rax, %rbx
[0,1]     D=eER.    .   adoxq   %rcx, %rdx
[0,2]     D==eER    .   adcxq   %rsp, %rbp
[0,3]     D===eER   .   adoxq   %rsi, %rdi
[0,4]     .D===eER  .   adcxq   %r8, %r9
[0,5]     .D====eER .   adoxq   %r10, %r11
[0,6]     .DeE----R .   decq    %r15
[0,7]     .D=eE---R .   jne z
[1,0]     . DeE---R .   adcxq   %rax, %rbx
[1,1]     . D=eE--R .   adoxq   %rcx, %rdx
[1,2]     . D==eE-R .   adcxq   %rsp, %rbp
[1,3]     . D===eER .   adoxq   %rsi, %rdi
[1,4]     .  D===eER.   adcxq   %r8, %r9
[1,5]     .  D====eER   adoxq   %r10, %r11
[1,6]     .  DeE----R   decq    %r15
[1,7]     .  D=eE---R   jne z

After decq , [1,0]adcxq is claimed executed on cycle 3, while it relies on result from cycle 5. adoxq can be executed early though. Looks like it's another thread as this also applies to an inc in adcq chain llvm community confirmed that "We only have an EFLAGS register modeled." and fixing that should fix both

0

There are 0 best solutions below