Pandas return index and column name based on item value
I am trying to return a column name and index based on the value of an element. I have something like this:
So, give me time to return the index and column names of all values ββwhere the value is> 0.75.
for date, row in df.iterrows():
for item in row:
if item > .75:
print index, row
I wanted this to return "traffic and rip-off". However, this returns all values. I haven't found an answer to this in the documentation, online or here. Thank you in advance.
source to share
By using slightly different numbers (for no particular reason), you can stack for a series and then use logical indexing:
In [11]: df.stack()
Out[11]:
assault assault 1.00
robbery 0.76
traffic 0.60
robbery assault 0.76
robbery 1.00
traffic 0.78
traffic assault 0.68
robbery 0.78
traffic 1.00
dtype: float64
In [12]: s = df.stack()
In [13]: s[(s!=1) & (s>0.77)]
Out[13]:
robbery traffic 0.78
traffic robbery 0.78
dtype: float64
You can do a little numpy to remove duplicates, one of which * is 0, which are not on the top diagonal with triu (unfortunately this does not come back as DataFrame :():
In [21]: np.triu(df, 1)
Out[21]:
array([[ 0. , 0.76, 0.6 ],
[ 0. , 0. , 0.78],
[ 0. , 0. , 0. ]])
In [22]: s = pd.DataFrame(np.triu(df, 1), df.index, df.columns).stack() > 0.77
In [23]: s[s]
Out[23]:
robbery traffic True
dtype: bool
In [24]: s[s].index.tolist()
Out[24]: [('robbery', 'traffic')]
* I suspect there are better ways ...
source to share
If you want to keep for loops, you can use columns and index:
for i in df.index:
for j in df.columns:
if (i != j) and (df[i][j] > 0.75):
print(i,j)
Then the output will be as follows:
robbery traffic
traffic robbery
Update: As FooBar pointed out, it is inefficient. Better to use something like FooBar and Andy Hayden:
In [3]: df[(df>0.75) & (df!=1)].stack().drop_duplicates()
Out[3]: robbery traffic 0.78
dtype: float64
source to share
I start with
assault robbery traffic
index
assault 1.00 0.74 0.68
robbery 0.74 1.00 0.78
traffic 0.68 0.78 1.00
and do
df = df.reset_index()
df2 = df.stack().reset_index()
df2.drop_duplicates(0)[df2[0] > 0.75][['index', 'level_1']]
index level_1
0 assault assault
5 robbery traffic
Where drop_duplicates()
gets rid of duplicate key pairs but assumes each key pair has a unique value (which is debatable).
source to share