I have a dataframe that looks similar to this:
In [45]: df
Out[45]:
Item_Id Location_Id date price
0 A 5372 1 0.5
1 A 5372 2 NaN
2 A 5372 3 1.0
3 A 6065 1 1.0
4 A 6065 2 1.0
5 A 6065 3 3.0
6 A 7000 1 NaN
7 A 7000 2 NaN
8 A 7000 3 NaN
9 B 5372 1 3.0
10 B 5372 2 NaN
11 B 5372 3 1.0
12 B 6065 1 2.0
13 B 6065 2 1.0
14 B 6065 3 3.0
15 B 7000 1 8.0
16 B 7000 2 NaN
17 B 7000 3 9.0
For every Item_Id in each Location_Id category, I want to compute a pairwise correlation of prices between every Item_Id pair. Please note that while I have only given two unique Item_Id
values in the sample data above, there are tens of different values that Item_Id takes on in my real data. I have tried using groupby.corr(), but this doesn't seem to give me what I want.
Ultimately, I want N dataframes where N is the number of unique Location_Id values in df. Each of the N dataframes will be a square correlation matrix of prices between all pairwise combinations of Item_Id present in a specific Location_Id category. So each of the N dataframes will have J rows and columns, where J is the number of unique Item_Id values in that specific Location_Id group.
You can group by
Location_Idthen pivot ondateandItem_Idand get the correlations:and you get a 2 x 2 matrix for each
Location_Id.