Pandas equivalent for grep

I'm new to pandas, For dataframe like:

N  Chem    Val
A  Sodium  9
B  Sodium  10
A  Chlorid 7
B  Chlorid 10
A  Sodium  17

      

I would like to do like grep

in bash to select rows containing 'A'

in 1st column and 'Sodium'

3rd column:

A  Sodium  9
A  Sodium  17

      

How can I do it? Think I need to use df[].str.contains()

? thank

+3


source to share


3 answers


You can use .str.contains()

on a dataframe column to return boolean Series

. You can also perform logical operations and

and or

for a few rows. Finally, passing a logical Series as a key to a data frame will only return values ​​that are true.



bool1 = df.N.str.contains('A')          # True for rows of N == 'A'
bool2 = df.Chem.str.contains('Sodium')  # True for rows of Chem == 'Sodium'
df[bool1 & bool2]   # selects rows where N=='A' AND Chem=='Sodium'

returns (without including the index):
N  Chem    Val
A  Sodium  9
A  Sodium  17

      

+2


source


In my opinion, usage query

is the most natural way to express this type of command



df.query('N == "A" & Chem == "Sodium"')

   N    Chem  Val
0  A  Sodium    9
4  A  Sodium   17

      

+2


source


If you meant to just select keys based on both columns, it is best not to use contains. This applies to the case where you need to select sodium_A, sodium_B, etc. From other lines (which means it might be slower than basic multiple choice).

import pandas as pd

# Your sample data
df = pd.read_table('sample.txt', header=None, delim_whitespace=True)

print(df[(df.loc[:, 0] == 'A') & (df.loc[:, 1] == 'Sodium')])

   0       1   2
1  A  Sodium   9
5  A  Sodium  17

      

+1


source







All Articles