I have a dataframe that looks similar to this:
In [45]: df
Out[45]:
Item_Id Location_Id date price
0 A 5372 1 0.5
1 A 5372 2 NaN
2 A 5372 3 1.0
3 A 6065 1 1.0
4 A 6065 2 1.0
5 A 6065 3 3.0
6 A 7000 1 NaN
7 A 7000 2 NaN
8 A 7000 3 NaN
9 B 5372 1 3.0
10 B 5372 2 NaN
11 B 5372 3 1.0
12 B 6065 1 2.0
13 B 6065 2 1.0
14 B 6065 3 3.0
15 B 7000 1 8.0
16 B 7000 2 NaN
17 B 7000 3 9.0
For every Item_Id
in each Location_Id
category, I want to compute a pairwise correlation of prices between every Item_Id
pair. Please note that while I have only given two unique Item_Id
values in the sample data above, there are tens of different values that Item_Id
takes on in my real data. I have tried using groupby.corr()
, but this doesn't seem to give me what I want.
Ultimately, I want N dataframes where N is the number of unique Location_Id
values in df
. Each of the N dataframes will be a square correlation matrix of prices between all pairwise combinations of Item_Id
present in a specific Location_Id
category. So each of the N dataframes will have J rows and columns, where J is the number of unique Item_Id
values in that specific Location_Id
group.
You can group by
Location_Id
then pivot ondate
andItem_Id
and get the correlations:and you get a 2 x 2 matrix for each
Location_Id
.