I'm new to Python, and i'm trying to calculate Page Rank vector according to this equation in Python:

Where Pi(k) is Page-rank vector after k-Th iteration, G is the Google matrix, H is Hyperlink matrix, a is a dangling node vector, alpha = 0.85 and e is vector of ones.
The calculation with G takes a lot of time, while using the Hyperlink matrix H, which is sparse matrix, should take significantly less time.
Here's my code:
for i in range(1, k_steps+1):
for j in range(0, len(dictionary_urls)):
for k in range(0, len(dictionary_urls)):
if matrix_H[k][j] != 0:
matrix_pi_k[i][j] += matrix_pi_k[i-1][k] * float(matrix_H[k][j])
alpha_pi_k_a += matrix_pi_k[i-1][k]*float(vector_a[k])
alpha_pi_k_a = alpha_pi_k_a * float(alpha)
alpha_pi_k_a = alpha_pi_k_a + float((1- alpha))
alpha_pi_k_a = alpha_pi_k_a / float(len(dictionary_urls))
matrix_pi_k[i][j] = matrix_pi_k[i][j] * float(alpha)
matrix_pi_k[i][j] = matrix_pi_k[i][j] + float(alpha_pi_k_a)
alpha_pi_k_a = 0
k_steps is the number of iteration needed.
dictionary_links contains all the URLs.
After code execution, matrix_pi_k should have all the Pi vector's
I calculated all the variables that needed. I got a run time using H matrix is almost equal to run time using G matrix, although, in theory it should be different.
Why? And what should I change to reduce the run time?
Thank you.
The problem is that you're multiplying a sparse matrix by a dense vector using the same-old dense matrix-vector multiplication algorithm. You won't see any speedups that way.
Suppose you have an
nxnmatrixA(dense or sparse) and ann-vectorx. To computey = Ax, we can write:This works whether the matrix
Ais dense or sparse. Suppose, though, thatAis sparse. We still loop over all columns ofAto compute a single entry ofy, even though most of the entries will be zero. So the outer loop goes throughniterations, and the inner loop also goes throughniterations.If we know which entries of
Aare nonzero, we can do much better. Suppose we have a list of all of the nonzero entries of rowi, call itnonzero[i]. Then we can replace the inner loop with iteration over this list:So while our outer loop does
niterations, the inner loop only does as many iterations as there are nonzero entries.This is where the speedup comes with sparse matrix-vector multiplication.
Use
numpy!But you have another problem: you're trying to do matrix multiplication with pure Python, which (due to type-checking, non-contiguous data structures, etc.) is slow. The solution is to use
numpy, which provides fast algorithms and data structures. Then you can usescipy's sparse matrices, as they implement fast sparse matrix-vector multiplication for you.Experiment
We can show all of this with a quick experiment. First we'll generate a
10,000 x 10,000dense matrixA:Then we'll make a sparse matrix
Bby thresholdingA.Bis the same size asA, but only 10% of its entries are nonzero:Now we'll make a dense vector to multiply
AandBwith:It takes the same amount of time to compute
Axas it does to computeBx, even thoughBis "sparse". Of course, it isn't really sparse: it's stored as a dense matrix with a lot of zero entries. Let's make it sparse:There's our speedup! Now, just for fun, what if we store
Aas a sparse matrix, even though it's really dense?Ouch! But this is to be expected, as storing
Aas a sparse matrix will incur some overhead during the multiplication.