Using iloc to replace a column when there are identical names
Suppose I have the following DataFrame with some identical column names
test = pd.DataFrame([[1, 2, 3, np.nan, np.nan],
[1, 2, 3, 4, 5],
[1, 2, 3, np.nan, np.nan],
[1, 2, 3, 4, np.nan]],
columns=['One', 'Two', 'Three', 'Three', 'Three'])
and I want to fill NaNs
in the fourth column. I expect to be able to use iloc
like
test.iloc[:, 3] = test.iloc[:, 3].fillna('F')
but it gives
In [121]: test
Out[121]:
One Two Three Three Three
0 1 2 F F F
1 1 2 4 4 4
2 1 2 F F F
3 1 2 4 4 4
Therefore, it changes based on the column name and not on the position. I could do it very naively like the following.
c = test.columns
test.columns = range(len(test.columns))
test.iloc[:, 3] = test.iloc[:, 3].fillna('F')
test.columns = c
which gives the correct result
In [142]: test
Out[142]:
One Two Three Three Three
0 1 2 3 F NaN
1 1 2 3 4 5.0
2 1 2 3 F NaN
3 1 2 3 4 NaN
but seems a little inefficient considering the simple task.
My question then is twofold.
- Will there be an easier method?
- Why doesn't the first one work? (why is
iloc
it still resorting to names when replacing columns?)
source to share
The answer to the second question about why the first method doesn't work can be caused by the way Pandas handles duplicate columns. While the constructor for DataFrame
has no customization to do this, the read_csv
documentation does have a parameter mangle_dupe_cols
, the default is True. The documentation says that passing to False can overwrite data. I suspect Pandas handles duplicate columns in a questionable way.
source to share
' iloc
indexes the dataframe object while fillna
looking for a series, so it won't let you apply fillna
.
Easier is to simply replace NaN values ββafter indexing them:
test.iloc[:,1][test.iloc[:,1].isnull()] = 'F'
Or, alternatively (and closer to your original code), actually select the column:
test.iloc[:, 3]['Three] = test.iloc[:, 3]['Three'].fillna('F')
source to share