I have a question related to access times of multidimensional arrays or derived types. I wrote an algorithm which works quite well. However, the main part of this algorithm uses %
to reference some data through different types. For example here is the most expensive loop in my code:
do c_elem = 1 , no_elems
call tmp_element%init_element(no_fl)
tmp_element%Gij(no_fl,1) = e_Gxa(c_elem)
tmp_element%Gij(no_fl,2) = e_Gax(c_elem)
GAX = tmp_element%Gij(no_fl,1)/GAA
GXA = tmp_element%Gij(no_fl,2)/GAA
do k = 1 , no_fl-1
s1 = this%flNew%Iij(k)
i1 = this%nodes(s1)%fl_id
tmp_element%Gij(k,1) = this%elOld(c_elem)%Gij(i1,1) + GAX*this%flNew%Gij(no_fl,k)
tmp_element%Gij(k,2) = this%elOld(c_elem)%Gij(i1,2) + this%flNew%Gij(k,no_fl)*GXA
enddo
call this%elOld(c_elem)%init_element(no_fl)
this%elOld(c_elem)%Gij = tmp_element%Gij
enddo
As you see most of the time I access some variables through %
and most of the arrays there are the "allocatable" ones, complex or integer. I've read that sometimes using derived types may lead to slow down of the computation, so I decided to rewrite my algorithm to work directly on arrays without types. The same loop which uses now just arrays looks like that:
do c_elem = 1 , no_elems
nGij1(no_fl,c_elem) = e_Gxa(c_elem)
nGij2(no_fl,c_elem) = e_Gax(c_elem)
GAX = e_Gxa(c_elem)/GAA
GXA = e_Gax(c_elem)/GAA
do k = 1 , no_fl-1
i1 = fl_id(k)
nGij1(k,c_elem) = oGij1(i1,c_elem) + GAX*fl_new_Gij(no_fl,k)
nGij2(k,c_elem) = oGij2(i1,c_elem) + fl_new_Gij(k,no_fl)*GXA
enddo
enddo
I've expected that the code above will run faster than code where I used derived types. So I've checked the timings. Here are CPU times for this part of code:
- 0.1s for the derived types method (first code)
- 3.2s for the arrays method (second code) (~30 times slower )
I use only the -O2
option during the compilation and ifort Intel compiler. In the codes above no_elems >> no_fl and most of the arrays and types are complex. It is clear that the only difference in the both algorithms are the memory access times.
Why is there a huge difference in the execution time for both cases?
Edit #1
I've tried to make a short example, however I could not manage to obtain similar results like with the full code... the times for both cases were almost the same -- arrays were slightly faster.
I can show you the thing which is completely strange for me: I have a type:
type element
complex*16,allocatable :: Gij(:,:)
integer :: no_sites
complex*16 :: Gp,Gxa,Gax
integer :: i,j
contains
procedure,pass(this) :: free_element
procedure,pass(this) :: init_element
procedure,pass(this) :: copy_element
end type element
Then, at some point of one big loop I calculate values like that:
! derived type way
this%elOld(c_elem)%Gax = GAX
this%elOld(c_elem)%Gxa = GXA
this%elOld(c_elem)%Gp = this%elOld(c_elem)%Gp + GXA*GAX/GAA
! array way to do it
e_Gax(c_elem) = GAX
e_Gxa(c_elem) = GXA
e_Gp (c_elem) = e_Gp (c_elem) + GXA*GAX/GAA
where:
complex(16),allocatable :: e_Gax(:),e_Gxa(:),e_Gp(:)
and elOld is an array of elements defined in another type
type(element),allocatable :: elOld(:),elNew(:)
Then I have loop which I already showed you, but if I will do it in that way:
tmp_element%Gij(no_fl,1) = e_Gxa(c_elem)
tmp_element%Gij(no_fl,2) = e_Gax(c_elem)
GAX = e_Gxa(c_elem)/GAA
GXA = e_Gxa(c_elem)/GAA
instead of that:
tmp_element%Gij(no_fl,1) = this%elOld(c_elem)%Gxa
tmp_element%Gij(no_fl,2) = this%elOld(c_elem)%Gax
GAX = tmp_element%Gij(no_fl,1)/GAA
GXA = tmp_element%Gij(no_fl,2)/GAA
program becomes noticeable slower... (with type(elemen) :: tmp_element), which means that access to simple array e_Gxa(:) is slower than access to complex variable Gxa in the array of types i.e. this%elOld(:)%Gxa. Changing just few lines of the code may change visible execution time. I've checked it with gfortran but the results were the same. Here is the part of the full code which I'm talking about: modskminv_fast.f90. Sorry for posting full module, but I could not obtain the same results when I splitted the code to small pieces.
The fault was obviously on my side. Just by accident I declared all the complex arrays as:
instead of
Sorry for bothering you. Now, it works like a charm :)