Pandas: Can't filter based on string equality
Using pandas 0.16.2 on python 2.7, OSX.
I read a dataframe from a csv file like this:
import pandas as pd
data = pd.read_csv("my_csv_file.csv",sep='\t', skiprows=(0), header=(0))
Conclusion data.dtypes
:
name object
weight float64
ethnicity object
dtype: object
I was expecting string types for name and ethnicity. But I found SO reasons here why they are "objects" in newer versions of pandas.
Now I want to select rows by ethnicity, for example:
data[data['ethnicity']=='Asian']
Out[3]:
Empty DataFrame
Columns: [name, weight, ethnicity]
Index: []
I am getting the same result with data[data.ethnicity=='Asian']
or data[data['ethnicity']=="Asian"]
.
But when I try the following:
data[data['ethnicity'].str.contains('Asian')].head(3)
I am getting the results that I want.
However, I don't want to use "contains" - I would like to check for direct equality.
Note what data[data['ethnicity'].str=='Asian']
is causing the error.
Am I doing something wrong? How to do it right?
source to share
There are probably spaces in your lines like
data = pd.DataFrame({'ethnicity':[' Asian', ' Asian']})
data.loc[data['ethnicity'].str.contains('Asian'), 'ethnicity'].tolist()
# [' Asian', ' Asian']
print(data[data['ethnicity'].str.contains('Asian')])
gives
ethnicity
0 Asian
1 Asian
To remove leading or trailing spaces from lines you can use
data['ethnicity'] = data['ethnicity'].str.strip()
then
data.loc[data['ethnicity'] == 'Asian']
gives
ethnicity
0 Asian
1 Asian
source to share