Python Pandas: filtering a data frame

I'm new to Pandas but wanted to try it after working with R for a while.

The problem I am facing is figuring out why the filter is not working for one of my data frames. I have a data frame data_df

with several columns, one of which c

contains the names of the countries. I am trying to filter the lines where c == None

.

My first attempt was to do this:

countries_df = data_df[data_df.c != None]

      

However, this gave 0 lines. This, however, worked:

countries_df = data_df[~data_df.c.isin([None])]

      

Can someone explain why? It looks like from the Pandas doc the first one should be able to filter correctly.

Several lines:

  _heartbeat_                           a                    al     c      cy     g
0   NaN Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; H...   en-US   US  Anaheim 15r91
1   NaN Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...   en-us   None    NaN ifIpBW
2   NaN Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20...   en-US,en;q=0.5  US  Fort Huachuca   10DaxOu
3   NaN Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; S...   en-US   US  Houston TysVFU
4   NaN Opera/9.80 (Android; Opera Mini/7.5.33286/29.3...   en  None    NaN 10IGW7m
5   NaN Mozilla/5.0 (compatible; MSIE 10.0; Windows NT...   en-US   US  Mishawaka   13GrCeP
6   NaN Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...   en-US,en;q=0.5  US  Hammond YmtpnZ
7   NaN Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 li...   en-us   None    NaN 13oM0hV
8   NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...   en-us   AU  Sydney  15r91
9   NaN Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...   en-US,en;q=0.8  None    NaN 109LtDc
10  NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...   en-us   US  Middletown  109ar5F
11  NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...   en-us   US  Germantown  107xZnW

      

+3


source to share


1 answer


It looks like pandas and Numpy are considering None

specifically when comparing for equality. In pandas, it is assumed to None

be like NaN, representing the missing value. To find lines where the value is not None (or nan

), you can do data_df[data_df.c.notnull()]

(or data_df[~data_df.c.isnull()]

).



+9


source







All Articles