Derived type access time vs arrays access time

179 Views Asked by At

I have a question related to access times of multidimensional arrays or derived types. I wrote an algorithm which works quite well. However, the main part of this algorithm uses % to reference some data through different types. For example here is the most expensive loop in my code:

   do c_elem = 1 , no_elems
        call tmp_element%init_element(no_fl)
        tmp_element%Gij(no_fl,1) = e_Gxa(c_elem) 
        tmp_element%Gij(no_fl,2) = e_Gax(c_elem) 
        GAX = tmp_element%Gij(no_fl,1)/GAA
        GXA = tmp_element%Gij(no_fl,2)/GAA
        do k = 1 , no_fl-1
            s1 = this%flNew%Iij(k)
            i1 = this%nodes(s1)%fl_id
            tmp_element%Gij(k,1) = this%elOld(c_elem)%Gij(i1,1) + GAX*this%flNew%Gij(no_fl,k)
            tmp_element%Gij(k,2) = this%elOld(c_elem)%Gij(i1,2) + this%flNew%Gij(k,no_fl)*GXA
        enddo
        call this%elOld(c_elem)%init_element(no_fl)
        this%elOld(c_elem)%Gij = tmp_element%Gij
   enddo

As you see most of the time I access some variables through % and most of the arrays there are the "allocatable" ones, complex or integer. I've read that sometimes using derived types may lead to slow down of the computation, so I decided to rewrite my algorithm to work directly on arrays without types. The same loop which uses now just arrays looks like that:

    do c_elem = 1 , no_elems
        nGij1(no_fl,c_elem) = e_Gxa(c_elem)
        nGij2(no_fl,c_elem) = e_Gax(c_elem)
        GAX = e_Gxa(c_elem)/GAA
        GXA = e_Gax(c_elem)/GAA
        do k = 1 , no_fl-1
            i1 = fl_id(k)
            nGij1(k,c_elem) = oGij1(i1,c_elem) + GAX*fl_new_Gij(no_fl,k)
            nGij2(k,c_elem) = oGij2(i1,c_elem) + fl_new_Gij(k,no_fl)*GXA
        enddo
    enddo

I've expected that the code above will run faster than code where I used derived types. So I've checked the timings. Here are CPU times for this part of code:

  1. 0.1s for the derived types method (first code)
  2. 3.2s for the arrays method (second code) (~30 times slower )

I use only the -O2 option during the compilation and ifort Intel compiler. In the codes above no_elems >> no_fl and most of the arrays and types are complex. It is clear that the only difference in the both algorithms are the memory access times.

Why is there a huge difference in the execution time for both cases?

Edit #1

I've tried to make a short example, however I could not manage to obtain similar results like with the full code... the times for both cases were almost the same -- arrays were slightly faster.

I can show you the thing which is completely strange for me: I have a type:

type element
  complex*16,allocatable :: Gij(:,:) 
  integer    :: no_sites    
  complex*16 :: Gp,Gxa,Gax  
  integer    :: i,j
  contains
  procedure,pass(this) :: free_element
  procedure,pass(this) :: init_element
  procedure,pass(this) :: copy_element
end type element

Then, at some point of one big loop I calculate values like that:

  ! derived type way
  this%elOld(c_elem)%Gax = GAX
  this%elOld(c_elem)%Gxa = GXA
  this%elOld(c_elem)%Gp  = this%elOld(c_elem)%Gp + GXA*GAX/GAA
  ! array way to do it
  e_Gax(c_elem) = GAX
  e_Gxa(c_elem) = GXA
  e_Gp (c_elem) = e_Gp (c_elem) + GXA*GAX/GAA

where:

  complex(16),allocatable :: e_Gax(:),e_Gxa(:),e_Gp(:)

and elOld is an array of elements defined in another type

  type(element),allocatable :: elOld(:),elNew(:)

Then I have loop which I already showed you, but if I will do it in that way:

  tmp_element%Gij(no_fl,1) = e_Gxa(c_elem) 
  tmp_element%Gij(no_fl,2) = e_Gax(c_elem) 
  GAX = e_Gxa(c_elem)/GAA
  GXA = e_Gxa(c_elem)/GAA

instead of that:

  tmp_element%Gij(no_fl,1) = this%elOld(c_elem)%Gxa
  tmp_element%Gij(no_fl,2) = this%elOld(c_elem)%Gax
  GAX = tmp_element%Gij(no_fl,1)/GAA
  GXA = tmp_element%Gij(no_fl,2)/GAA

program becomes noticeable slower... (with type(elemen) :: tmp_element), which means that access to simple array e_Gxa(:) is slower than access to complex variable Gxa in the array of types i.e. this%elOld(:)%Gxa. Changing just few lines of the code may change visible execution time. I've checked it with gfortran but the results were the same. Here is the part of the full code which I'm talking about: modskminv_fast.f90. Sorry for posting full module, but I could not obtain the same results when I splitted the code to small pieces.

1

There are 1 best solutions below

0
On

The fault was obviously on my side. Just by accident I declared all the complex arrays as:

complex(16)

instead of

complex*16 or complex(kind=8)

Sorry for bothering you. Now, it works like a charm :)