There are two large matrices in my code, which have the same number of columns and different number of rows. Like A(20000X4000)
and B(30000X4000)
. Both are 0-1 sparse.
I should check each row of A with all rows of B and count the number of common 1s. For example if A(1,:)=[0 1 0 1 1]
and B([1 2],:)=[1 1 1 1 1;0 0 0 1 1]
, I need to get the result as 3
and 2
.
Assume there is a large 0-1 matrix C(50000X4000)
and its rows are labeled as either type A
or type B
. I should compare all rows of A
and B
together and enumerate 1s. If the number of 1s in each row of A and B are greater than some bounds, then I used those rows of A and B for the rest of calculation. So, I do not even need to store A
and B
, all I need is a list of pairs of indices of rows. Something like [(3,2),(3,5),...]
which shows I should use the third row of A
and the second row of B
, also the third of A
and the fifth of B
and so on.
The first thing that came into my mind was A*B'
, it gives the correct result, but practically it is very costly and in some cases impossible to do this multiplication.
I converted the matrices into single data type, and it became a bit faster. Sparse did not help.
The task seems easy, just counting common 1s of each row of A
and all rows of B
, but not easy to implement. Considering that the code should do this task like 1000 times, then it is practically impossible.
Any idea how to do enumerate the common ones without multiplication? (btw, loops also did not help).
Thanks.
If multiplication of the whole matrices cannot be done, one idea is to process a vertical stripe at a time. For each stripe you compute the desired result, and accumulate it with that of the preceding stripes:
Check: