Pandas: Can't filter based on string equality

Using pandas 0.16.2 on python 2.7, OSX.

I read a dataframe from a csv file like this:

import pandas as pd

data = pd.read_csv("my_csv_file.csv",sep='\t', skiprows=(0), header=(0))

      

Conclusion data.dtypes

:

name       object
weight     float64
ethnicity  object
dtype: object

      

I was expecting string types for name and ethnicity. But I found SO reasons here why they are "objects" in newer versions of pandas.

Now I want to select rows by ethnicity, for example:

data[data['ethnicity']=='Asian']
Out[3]: 
Empty DataFrame
Columns: [name, weight, ethnicity]
Index: []

      

I am getting the same result with data[data.ethnicity=='Asian']

or data[data['ethnicity']=="Asian"]

.

But when I try the following:

data[data['ethnicity'].str.contains('Asian')].head(3)

      

I am getting the results that I want.

However, I don't want to use "contains" - I would like to check for direct equality.

Note what data[data['ethnicity'].str=='Asian']

is causing the error.

Am I doing something wrong? How to do it right?

+3


source to share


2 answers


There are probably spaces in your lines like

data = pd.DataFrame({'ethnicity':[' Asian', '  Asian']})
data.loc[data['ethnicity'].str.contains('Asian'), 'ethnicity'].tolist()
# [' Asian', '  Asian']
print(data[data['ethnicity'].str.contains('Asian')])

      

gives

  ethnicity
0     Asian
1     Asian

      

To remove leading or trailing spaces from lines you can use



data['ethnicity'] = data['ethnicity'].str.strip()

      

then

data.loc[data['ethnicity'] == 'Asian']

      

gives

  ethnicity
0     Asian
1     Asian

      

+2


source


You can try this:



data[data['ethnicity'].str.strip()=='Asian']

      

+1


source







All Articles