Pandas: how to delete lines with a value ending with a specific character?
I have a pandas DataFrame like this:
mail = DataFrame({'mail' : ['adv@gmail.com', 'fhngn@gmail.com', 'foinfo@yahoo.com', 'njfjrnfjrn@yahoo.com', 'nfjebfjen@hotmail.com', 'gnrgiprou@hotmail.com', 'jfei@hotmail.com']})
as follows:
mail
0 adv@gmail.com
1 fhngn@gmail.com
2 foinfo@yahoo.com
3 njfjrnfjrn@yahoo.com
4 nfjebfjen@hotmail.com
5 gnrgiprou@hotmail.com
6 jfei@hotmail.com
What I want to do is filter out (exclude) all those rows where the value in the column mail ends with "@ gmail.com".
source to share
You can use str.endswith
and negate the result of a boolean series with ~
:
mail[~mail['mail'].str.endswith('@gmail.com')]
What produces:
mail
2 foinfo@yahoo.com
3 njfjrnfjrn@yahoo.com
4 nfjebfjen@hotmail.com
5 gnrgiprou@hotmail.com
6 jfei@hotmail.com
Pandas has many other vectorized string operations that are accessible through the accessory .str
. Many of them are instantly familiar with Python's own string methods, but come will be built in value handling NaN
.
source to share
The type column str
has a field.str
with which you can access the standard functions defined for one str
:
[6]: mail['mail'].str.endswith('gmail.com')
Out[6]:
0 True
1 True
2 False
3 False
4 False
5 False
6 False
Name: mail, dtype: bool
Then you can filter this series:
[7]: mail[~mail['mail'].str.endswith('gmail.com')]
Out[7]:
mail
2 foinfo@yahoo.com
3 njfjrnfjrn@yahoo.com
4 nfjebfjen@hotmail.com
5 gnrgiprou@hotmail.com
6 jfei@hotmail.com
A similar property.dt
exists to access date / time-related properties of a column if it contains date data.
source to share