Pandas equivalent for grep
I'm new to pandas, For dataframe like:
N Chem Val
A Sodium 9
B Sodium 10
A Chlorid 7
B Chlorid 10
A Sodium 17
I would like to do like grep
in bash to select rows containing 'A'
in 1st column and 'Sodium'
3rd column:
A Sodium 9
A Sodium 17
How can I do it? Think I need to use df[].str.contains()
? thank
source to share
You can use .str.contains()
on a dataframe column to return boolean Series
. You can also perform logical operations and
and or
for a few rows. Finally, passing a logical Series as a key to a data frame will only return values that are true.
bool1 = df.N.str.contains('A') # True for rows of N == 'A'
bool2 = df.Chem.str.contains('Sodium') # True for rows of Chem == 'Sodium'
df[bool1 & bool2] # selects rows where N=='A' AND Chem=='Sodium'
returns (without including the index):
N Chem Val
A Sodium 9
A Sodium 17
source to share
If you meant to just select keys based on both columns, it is best not to use contains. This applies to the case where you need to select sodium_A, sodium_B, etc. From other lines (which means it might be slower than basic multiple choice).
import pandas as pd
# Your sample data
df = pd.read_table('sample.txt', header=None, delim_whitespace=True)
print(df[(df.loc[:, 0] == 'A') & (df.loc[:, 1] == 'Sodium')])
0 1 2
1 A Sodium 9
5 A Sodium 17
source to share