Pandas: Can't filter based on string equality

Question

Pandas: Can't filter based on string equality

Using pandas 0.16.2 on python 2.7, OSX.

I read a dataframe from a csv file like this:

import pandas as pd

data = pd.read_csv("my_csv_file.csv",sep='\t', skiprows=(0), header=(0))

Conclusion data.dtypes

:

name       object
weight     float64
ethnicity  object
dtype: object

I was expecting string types for name and ethnicity. But I found SO reasons here why they are "objects" in newer versions of pandas.

Now I want to select rows by ethnicity, for example:

data[data['ethnicity']=='Asian']
Out[3]: 
Empty DataFrame
Columns: [name, weight, ethnicity]
Index: []

I am getting the same result with data[data.ethnicity=='Asian']

or data[data['ethnicity']=="Asian"]

.

But when I try the following:

data[data['ethnicity'].str.contains('Asian')].head(3)

I am getting the results that I want.

However, I don't want to use "contains" - I would like to check for direct equality.

Note what data[data['ethnicity'].str=='Asian']

is causing the error.

Am I doing something wrong? How to do it right?

+3

python string pandas filtering selection

vpk 08 jul. 15 at 21:05

source to share

2 answers

You can try this:

data[data['ethnicity'].str.strip()=='Asian']

+1

Daniel Martin 08 jul. At 21:27

source to share

unutbu · Accepted Answer · 2015-07-08T21:35:01+0000

There are probably spaces in your lines like

data = pd.DataFrame({'ethnicity':[' Asian', '  Asian']})
data.loc[data['ethnicity'].str.contains('Asian'), 'ethnicity'].tolist()
# [' Asian', '  Asian']
print(data[data['ethnicity'].str.contains('Asian')])

gives

  ethnicity
0     Asian
1     Asian

To remove leading or trailing spaces from lines you can use

data['ethnicity'] = data['ethnicity'].str.strip()

then

data.loc[data['ethnicity'] == 'Asian']

gives

  ethnicity
0     Asian
1     Asian

Pandas: Can't filter based on string equality

More articles: