I want to build my own PCA in python for the dataset having shape of (1934,32). Numpy array(binary image file). In the PCA I need to calculate the scatter matrix. I have a code, that works fine on images and an array of sizes (3,x). but doesn't work on mine.
I tried reshaping the np.zeros and reshape method to 32 and 1934, but nothing works. Here's a code glimpse what I'm using right now
for i in range(X.shape[1]):
scatter_matrix += (X[:,i].reshape(3,1) - mean_vector).dot((X[:,i].reshape(3,1) - mean_vector).T)
print('Scatter Matrix:\n', scatter_matrix)
The error is "Cannot convert an array of size 1934 into shape (3,1)"
I found a solution by adding a scatter matrix of dimension (1934,1934) instead of (3,1). And it's working fine for now. The code looks like below
But, now I am stuck with the dot product computation in the above code. It's taking too much time even on the Kaggle GPU environment. I cannot even get the result for a single iteration over the dataset.
Is there any solution available to make it faster?