Pandas / numpy code cleaner to find equivalence matrix?
I have a pandas DataFrame and would like to create an equivalence matrix (or whatever it calls it) where each cell has one value if df.Col [i] == df.Col [j] and another value when! =.
The following code works:
df = pd.DataFrame({"Col":[1, 2, 3, 1, 2]}, index=["A","B","C","D","E"])
df
Col
A 1
B 2
C 3
D 1
E 2
sm = pd.DataFrame(columns=df.index, index=df.index)
for i in df.index:
for j in df.index:
if df.Col[i] == df.Col[j]:
sm.loc[i, j] = 3
else:
sm.loc[i, j] = -1
sm
A B C D E
A 3 -1 -1 3 -1
B -1 3 -1 -1 3
C -1 -1 3 -1 -1
D 3 -1 -1 3 -1
E -1 3 -1 -1 3
But there must be a better way. Perhaps using numpy? Any thoughts?
[change]
Using what piRsquared wrote, maybe something like?
m = df.values == df.values[:, 0]
sm = pd.DataFrame(None, df.index, df.index).where(m, 3).where(~m, -1)
Can this be improved?
+3
source to share
3 answers
#initialize your sm to 1s
sm = pd.DataFrame(columns=df.index, index=df.index, data=1)
#create a mask to indicate equivalence
mask = (np.asarray(df)[:,None]==np.asarray(df)).reshape(5,5)
#set non-equivalent elements to -1
sm = sm.where(mask,-1)
sm
Out[129]:
A B C D E
A 1 -1 -1 1 -1
B -1 1 -1 -1 1
C -1 -1 1 -1 -1
D 1 -1 -1 1 -1
E -1 1 -1 -1 1
+1
source to share
Here, using multiplication
to have a compact solution -
a = df.values
sm = pd.DataFrame(4*(a[:,0]==a)-1, df.index, df.index)
To make the meaning of -1
and 1
, replace 4
with 2
.
Example run -
In [41]: df
Out[41]:
Col
A 1
B 2
C 3
D 1
E 2
In [42]: a = df.values
In [43]: pd.DataFrame(4*(a[:,0] == a)-1, df.index, df.index)
Out[43]:
A B C D E
A 3 -1 -1 3 -1
B -1 3 -1 -1 3
C -1 -1 3 -1 -1
D 3 -1 -1 3 -1
E -1 3 -1 -1 3
+1
source to share