Pandas equivalent for grep

I'm new to pandas, For dataframe like:

N  Chem    Val
A  Sodium  9
B  Sodium  10
A  Chlorid 7
B  Chlorid 10
A  Sodium  17

      

I would like to do like grep

in bash to select rows containing 'A'

in 1st column and 'Sodium'

3rd column:

A  Sodium  9
A  Sodium  17

      

How can I do it? Think I need to use df[].str.contains()

? thank

+3
python pandas grep dataframe


source to share


3 answers


You can use .str.contains()

on a dataframe column to return boolean Series

. You can also perform logical operations and

and or

for a few rows. Finally, passing a logical Series as a key to a data frame will only return values ​​that are true.



bool1 = df.N.str.contains('A')          # True for rows of N == 'A'
bool2 = df.Chem.str.contains('Sodium')  # True for rows of Chem == 'Sodium'
df[bool1 & bool2]   # selects rows where N=='A' AND Chem=='Sodium'

returns (without including the index):
N  Chem    Val
A  Sodium  9
A  Sodium  17

      

+2


source to share


In my opinion, usage query

is the most natural way to express this type of command



df.query('N == "A" & Chem == "Sodium"')

   N    Chem  Val
0  A  Sodium    9
4  A  Sodium   17

      

+2


source to share


If you meant to just select keys based on both columns, it is best not to use contains. This applies to the case where you need to select sodium_A, sodium_B, etc. From other lines (which means it might be slower than basic multiple choice).

import pandas as pd

# Your sample data
df = pd.read_table('sample.txt', header=None, delim_whitespace=True)

print(df[(df.loc[:, 0] == 'A') & (df.loc[:, 1] == 'Sodium')])

   0       1   2
1  A  Sodium   9
5  A  Sodium  17

      

+1


source to share







All Articles
Loading...
X
Show
Funny
Dev
Pics