Conjugate correlation
I have a dataframe that looks something like this:
In [45]: df
Out[45]:
Item_Id Location_Id date price
0 A 5372 1 0.5
1 A 5372 2 NaN
2 A 5372 3 1.0
3 A 6065 1 1.0
4 A 6065 2 1.0
5 A 6065 3 3.0
6 A 7000 1 NaN
7 A 7000 2 NaN
8 A 7000 3 NaN
9 B 5372 1 3.0
10 B 5372 2 NaN
11 B 5372 3 1.0
12 B 6065 1 2.0
13 B 6065 2 1.0
14 B 6065 3 3.0
15 B 7000 1 8.0
16 B 7000 2 NaN
17 B 7000 3 9.0
For everyone Item_Id
in each category, Location_Id
I want to calculate the pairwise price correlation between each pair Item_Id
. Note that although I only gave two unique Item_Id
values ββin the sampled data above, there are dozens of different values ββit Item_Id
takes in my real data. I tried using groupby.corr()
but that doesn't seem to give me what I want.
Ultimately, I want N dataframes, where N is the number of unique values Location_Id
in df
. Each of the N data frames will be a square price correlation matrix between all pairings Item_Id
present in a particular category Location_Id
. Thus, each of the N data frames will contain J rows and columns, where J is the number of unique values Item_Id
in that particular group Location_Id
.
source to share
You can group by Location_Id
, then rotate by date
and Item_Id
and get correlations:
>>> corr = lambda obj: obj.pivot('date', 'Item_Id', 'price').corr()
>>> df.groupby('Location_Id').apply(corr)
Item_Id A B
Location_Id Item_Id
5372 A 1.000 -1.000
B -1.000 1.000
6065 A 1.000 0.866
B 0.866 1.000
7000 A NaN NaN
B NaN 1.000
and you get a 2 x 2 matrix for each Location_Id
.
source to share