I am using a dataset as below. Rows show invoice numbers, columns show products. I want to show the number of products on the same invoice as a matrix (i.e. there will be products in both rows and columns, the intersection of the row and column will show how many times those 2 products are on the same invoice. How can I do that? Thanks.
Note: '1' indicates that the product is included in that invoice, and '0' indicates that it is not. finally i want to get a matrix like in the picture. intersecting cells show the number of sales of the relevant product pair.
import pandas as pd
ids = ['invoice_1','invoice_2','invoice_3','invoice_4','invoice_5','invoice_6']
A= [0,0,1,0,1,1]
B= [0,1,1,0,1,1]
C= [1,1,1,0,1,0]
D= [1,0,0,1,1,0]
df=pd.DataFrame.from_dict({'A':A, 'B':B, 'C':C, 'D':D})
df.index=ids
Actually I want to get Table 2 from Table 1. AA=3 because product A is included in 3 invoices (rows) totally. AB=4 because A and B are included in 4 invoices (rows) together. Note: Even if the AA, BB, CC , DD cells are not full, it does not matter. Binary products (like AB, DC etc.) are important to me.
Table 1
A B C D
invoice_1 0 0 1 1
invoice_2 0 1 1 0
invoice_3 1 1 1 0
invoice_4 0 0 0 1
invoice_5 1 1 1 1
invoice_6 1 1 0 0
invoice_7 1 1 0 0
Table 2
A B C D
A 4 4 2 1
B 4 4 3 1
C 2 3 4 2
D 1 1 2 3
The OP made two mistakes here. The first one is the input to generate the intended Table 1 should be:
Table 2 is actually the dot product of 2 matrices, i.e. matrix
dfand its transpose. One can use numpy.dot to produce it.The second mistake I pointed out is BB where it should be 5 instead of 4.