I'm trying to calculate the cosine similarity between all the values.
The time for 1000*20000 calculations cost me more than 10 mins.
Code:
from gensim import matutils
# array_A contains 1,000 TF-IDF values
# array_B contains 20,000 TF-IDF values
for x in array_A:
for y in array_B:
matutils.cossim(x,y)
It's necessary to using gensim package to get the tf-idf value and similarity calculation.
Can someone please give me some advice and guidance to speed up time?
use memoize and also maybe use tuples (it may be faster) for the arrays:
EDIT also after using the code above maybe add this just in case you are doing something else with the data