Python.drop does not give expected result

I have a dataframe called xxx

. One column xxx

is Final and xxx

looks like this:

  FpPropeTypCode DTE_DATE_DEATH             Area         Final  
0             FP            NaN  Ame_MidEast_Lnd           NaN  
1             FP            NaN  Southern_Europe  W.E.M. Lines  
2             FP            NaN              NaN           NaN  
3             ZP            NaN  Ame_MidEast_Lnd           NaN  
4             YY            NaN  Ame_MidEast_Lnd           NaN  

      

I would like to remove all lines with NaN for Final, so I did

xxx= xxx.drop(pd.isnull(data_file_fp4['Final']))

Sorry, I got

  FpPropeTypCode DTE_DATE_DEATH             Area                         Final  
2             FP            NaN              NaN                           NaN  
3             ZP            NaN  Ame_MidEast_Lnd                           NaN  
4             YY            NaN  Ame_MidEast_Lnd                           NaN  
5             NN            NaN  Ame_MidEast_Lnd  NORTH ARM TRANSPORTATION LTD  
6             CP            NaN  Northern_Europe                     MPC Group 

      

which is obviously wrong ...

What I really need to do is flush the rows based on two conditions: Final - NaN and Area - Ame_MidEast_Lnd. So I cannot use dropna

What was wrong in my current codes to make the first condition? Thanks in advance.

+3


source to share


1 answer


Are you using pandas? Pandas has a feature that will allow you to cast rows based on criteria, in which case the specific column is NaN: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

The specific command you're looking for will probably look something like this:

xxx = xxx.dropna(axis=0, subset=['Final'])

      

axis = 0 indicates that you want to delete rows, not columns subset indicates what you want to discard, where "Final" is NaN

EDIT: The responder cannot use dropna because their filter logic is more complex.

If you need more complex logic, you might be better off just doing the brace logic. I'll try to check at some point, but you can try something like this:



xxx = xxx[~xxx['Final'].isnull()]

      

If you want the second piece of logic where you have a NaN filter and a column filter, you should do this:

xxx = xxx[~(xxx['Final'].isnull() & xxx['Area'].str.contains("Ame_MidEast_Lnd"))]

      

I verified that it works by running this python file below:

import pandas as pd
import numpy as np

xxx = pd.DataFrame([
                    ['FP', np.nan, 'Ame_MidEast_Lnd', np.nan],
                    ['FP', np.nan, 'Southern_Europe', 'W.E.M. Lines'],
                    ['FP', np.nan, np.nan, np.nan],
                    ['ZP', np.nan, 'Ame_MidEast_Lnd', np.nan],
                    ['YY', np.nan, 'Ame_MidEast_Lnd', np.nan]],
                   columns=['FpPropeTypCode','DTE_DATE_DEATH','Area', 'Final']
                   )

# before
print xxx

# whatever rows have both 'Final' as NaN and 'Area' containing Ame_MidEast_Lnd, we do NOT want those rows
xxx = xxx[~(xxx['Final'].isnull() & xxx['Area'].str.contains("Ame_MidEast_Lnd"))]

# after
print xxx

      

You will see that the solution works the way you want.

+4


source







All Articles