Python Pandas: filtering a data frame
I'm new to Pandas but wanted to try it after working with R for a while.
The problem I am facing is figuring out why the filter is not working for one of my data frames. I have a data frame data_df
with several columns, one of which c
contains the names of the countries. I am trying to filter the lines where c == None
.
My first attempt was to do this:
countries_df = data_df[data_df.c != None]
However, this gave 0 lines. This, however, worked:
countries_df = data_df[~data_df.c.isin([None])]
Can someone explain why? It looks like from the Pandas doc the first one should be able to filter correctly.
Several lines:
_heartbeat_ a al c cy g
0 NaN Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; H... en-US US Anaheim 15r91
1 NaN Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ... en-us None NaN ifIpBW
2 NaN Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20... en-US,en;q=0.5 US Fort Huachuca 10DaxOu
3 NaN Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; S... en-US US Houston TysVFU
4 NaN Opera/9.80 (Android; Opera Mini/7.5.33286/29.3... en None NaN 10IGW7m
5 NaN Mozilla/5.0 (compatible; MSIE 10.0; Windows NT... en-US US Mishawaka 13GrCeP
6 NaN Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G... en-US,en;q=0.5 US Hammond YmtpnZ
7 NaN Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 li... en-us None NaN 13oM0hV
8 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ... en-us AU Sydney 15r91
9 NaN Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... en-US,en;q=0.8 None NaN 109LtDc
10 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ... en-us US Middletown 109ar5F
11 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ... en-us US Germantown 107xZnW
source to share
It looks like pandas and Numpy are considering None
specifically when comparing for equality. In pandas, it is assumed to None
be like NaN, representing the missing value. To find lines where the value is not None (or nan
), you can do data_df[data_df.c.notnull()]
(or data_df[~data_df.c.isnull()]
).
source to share