Proceeding from one of my earlier posts, I noticed that the operation np.add.at(A, indices, B)
is a lot slower than A[indices] += B
.
def fast(A):
n = A.shape[0]
retval = np.zeros(2*n-1)
for i in range(n):
retval[slice(i,i+n)] += A[i, :]
return retval
def slow(A):
n = A.shape[0]
retval = np.zeros(2*n-1)
for i in range(n):
np.add.at(retval, slice(i,i+n), A[i, :])
return retval
def slower(A):
n = A.shape[0]
retval = np.zeros(2*n-1)
indices = np.arange(n)
indices = indices[:,None] + indices
np.add.at(retval, indices, A) # bottleneck here
return retval
My timings:
A = np.random.randn(10000, 10000)
timeit(lambda: fast(A), number=10) # 0.8852798199995959
timeit(lambda: slow(A), number=10) # 56.633683917999406
timeit(lambda: slower(A), number=10) # 57.763389584000834
Clearly, using the __iadd__
is a lot faster. However, the documentation for np.add.at
states:
Performs unbuffered in place operation on operand ‘a’ for elements specified by ‘indices’. For addition ufunc, this method is equivalent to a[indices] += b, except that results are accumulated for elements that are indexed more than once.
Why is np.add.at
so slow?
What is the use-case for this function?
add.at
was intended for cases where indices contain duplicates and+=
does not produce the desired resultWith
add.at
:Same result as with an explicit iteration:
Some timings:
add.at
is slower than+=
but better than the python iteration.We could test variants such as
A[:4]+1
,A[:4]+=1
, etc.Anyways, don't use the
add.at
variant if you don't need it.edit
Your example, simplified a bit:
So you are adding values to overlapping slices. We could replace the slices with an array:
No need to add your 'slow' case,
add.at
with slices because the indices don't have duplicates.Creating all the indexes.
+=
does not work because of bufferingadd.at
solves that:And the full python iteration:
Now some timings.
The base line:
Advanced indexing slows this down some:
If it worked, one advanced-indexing += would be fastest:
Full python iteration is about the same as the looped arange case:
add.at
is slower than the flat +=, but still better than the base line:In my smaller test, your
slower
does best. But it's possible that it does not scale as well as base/fast. Yourindices
is much larger. Often for very large arrays, iteration on one dimension is faster due to reduced memory management load.