Python.drop does not give expected result
I have a dataframe called xxx
. One column xxx
is Final and xxx
looks like this:
FpPropeTypCode DTE_DATE_DEATH Area Final
0 FP NaN Ame_MidEast_Lnd NaN
1 FP NaN Southern_Europe W.E.M. Lines
2 FP NaN NaN NaN
3 ZP NaN Ame_MidEast_Lnd NaN
4 YY NaN Ame_MidEast_Lnd NaN
I would like to remove all lines with NaN for Final, so I did
xxx= xxx.drop(pd.isnull(data_file_fp4['Final']))
Sorry, I got
FpPropeTypCode DTE_DATE_DEATH Area Final
2 FP NaN NaN NaN
3 ZP NaN Ame_MidEast_Lnd NaN
4 YY NaN Ame_MidEast_Lnd NaN
5 NN NaN Ame_MidEast_Lnd NORTH ARM TRANSPORTATION LTD
6 CP NaN Northern_Europe MPC Group
which is obviously wrong ...
What I really need to do is flush the rows based on two conditions: Final - NaN and Area - Ame_MidEast_Lnd. So I cannot use dropna
What was wrong in my current codes to make the first condition? Thanks in advance.
source to share
Are you using pandas? Pandas has a feature that will allow you to cast rows based on criteria, in which case the specific column is NaN: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
The specific command you're looking for will probably look something like this:
xxx = xxx.dropna(axis=0, subset=['Final'])
axis = 0 indicates that you want to delete rows, not columns subset indicates what you want to discard, where "Final" is NaN
EDIT: The responder cannot use dropna because their filter logic is more complex.
If you need more complex logic, you might be better off just doing the brace logic. I'll try to check at some point, but you can try something like this:
xxx = xxx[~xxx['Final'].isnull()]
If you want the second piece of logic where you have a NaN filter and a column filter, you should do this:
xxx = xxx[~(xxx['Final'].isnull() & xxx['Area'].str.contains("Ame_MidEast_Lnd"))]
I verified that it works by running this python file below:
import pandas as pd
import numpy as np
xxx = pd.DataFrame([
['FP', np.nan, 'Ame_MidEast_Lnd', np.nan],
['FP', np.nan, 'Southern_Europe', 'W.E.M. Lines'],
['FP', np.nan, np.nan, np.nan],
['ZP', np.nan, 'Ame_MidEast_Lnd', np.nan],
['YY', np.nan, 'Ame_MidEast_Lnd', np.nan]],
columns=['FpPropeTypCode','DTE_DATE_DEATH','Area', 'Final']
)
# before
print xxx
# whatever rows have both 'Final' as NaN and 'Area' containing Ame_MidEast_Lnd, we do NOT want those rows
xxx = xxx[~(xxx['Final'].isnull() & xxx['Area'].str.contains("Ame_MidEast_Lnd"))]
# after
print xxx
You will see that the solution works the way you want.
source to share