Say I have a 500k x 500k user-item purchase history binary matrix. I plan to compute the user-item similarity (by conditional probability) and store it to an item-item matrix. But in such huge dataset, is there any recommended approaches to compute and store the data?
Have look into MongoDB to store all the precompute results, but still not sure which one works best? Any help or ideas will be appreciated.