I have some simple structs:
struct ab { double a,b; }
struct abcd { double a,b,c,d; }
struct ch
{
...
std::vector<abcd> x;
std::vector<size_t> ir;
...
}
And code:
ch l;
std::vector<ab> x;
double c,f;
...
for(size_t i = ... )
{
...
l.x[i].c = (l.x[i].c / c) + f*x[l.ir[i]].a; // line#1
...
}
CodeXl shows that one of the most expensive lines is line#1. And 60% of line#1 take
mov eax,[edx+eax]
How can I optimize line#1?
Why "mov" operation more expensive than mul and div?
Upd Full decompiling of line#1 from CodeXl:
l.x[i].c = (l.x[i].c / c) + f*x[l.ir[i]].a; => 15.871% of function time
;;
mov ecx,[ebx+4ch]
lea edx,[edi*4+00000000h] => 0.99194%
shl edi,05h
mov eax,[ebx+1ch]
movsd xmm0,[ecx+edi+10h]
divsd xmm0,xmm2 => 1.17793%
mov eax,[edx+eax] => 10.0434%
add eax,eax
movsd xmm1,[esi+eax*8]
mulsd xmm1,xmm4
addsd xmm1,xmm0 => 1.30192%
movsd [ecx+edi+10h],xmm1 => 2.35586%
Upd Microsoft Visual Studio 2013. Release32
mul
anddiv
are fast because the arguments are available.mov eax, [eax+edx]
requires an argument from memory. Is it in cache or prefetched? I suspect this particularmov
is from yourx[l.ir[i]]
expression,x
is sufficiently large to be uncached, andl.ir[i]
is sufficiently non-linear to defeat the prefetcher. That means you're waiting for main memory.