Why does Anaconda Accelerate compute dot products slower than plain NumPy on Python 3? I'm using accelerate version 2.3.1 with accelerate_cudalib 2.0 installed, Python 3.5.2 Windows 10 64-bit.
import numpy as np
from accelerate.cuda.blas import dot as gpu_dot
import time
def numpydot():
start= time.time()
for i in range(100):
np.dot(np.arange(1000000, dtype=np.float64), np.arange(1000000, dtype=np.float64))
elapsedtime = time.time()-start
return elapsedtime
def acceleratedot():
start= time.time()
for i in range(100):
gpu_dot(np.arange(1000000, dtype=np.float64), np.arange(1000000, dtype=np.float64))
elapsedtime = time.time()-start
return elapsedtime
numpydot()
0.6446375846862793
acceleratedot()
1.33168363571167
I figured out that shared arrays are created with Numba, a separate library. They have the documentation on their site.