There are two large matrices in my code, which have the same number of columns and different number of rows. Like A(20000X4000) and B(30000X4000). Both are 0-1 sparse.
I should check each row of A with all rows of B and count the number of common 1s. For example if A(1,:)=[0 1 0 1 1] and B([1 2],:)=[1 1 1 1 1;0 0 0 1 1], I need to get the result as 3 and 2.
Assume there is a large 0-1 matrix C(50000X4000) and its rows are labeled as either type A or type B. I should compare all rows of A and B together and enumerate 1s. If the number of 1s in each row of A and B are greater than some bounds, then I used those rows of A and B for the rest of calculation. So, I do not even need to store A and B, all I need is a list of pairs of indices of rows. Something like [(3,2),(3,5),...] which shows I should use the third row of A and the second row of B, also the third of A and the fifth of B and so on.
The first thing that came into my mind was A*B', it gives the correct result, but practically it is very costly and in some cases impossible to do this multiplication.
I converted the matrices into single data type, and it became a bit faster. Sparse did not help.
The task seems easy, just counting common 1s of each row of A and all rows of B, but not easy to implement. Considering that the code should do this task like 1000 times, then it is practically impossible.
Any idea how to do enumerate the common ones without multiplication? (btw, loops also did not help).
Thanks.
I do not know if this is really any better than what you have, because it does still have a for loop in it, but if someone can figure out how to remove that for loop you should be good to go.
Either way, the process is going to take a long time because you are comparing 30000 rows with 4000 elements each, 20000 times, or approximately
2.4e+12comparisons. That is a huge task, and will definitely take a long time. Possibly try using parallel computing if you need it to be faster.