Using iloc to replace a column when there are identical names

Suppose I have the following DataFrame with some identical column names

test = pd.DataFrame([[1, 2, 3, np.nan, np.nan],
                     [1, 2, 3,      4,      5],
                     [1, 2, 3, np.nan, np.nan],
                     [1, 2, 3,      4, np.nan]],
                    columns=['One', 'Two', 'Three', 'Three', 'Three'])

      

and I want to fill NaNs

in the fourth column. I expect to be able to use iloc

like

test.iloc[:, 3] = test.iloc[:, 3].fillna('F')

      

but it gives

In [121]: test
Out[121]:
   One  Two Three Three Three
0    1    2     F     F     F
1    1    2     4     4     4
2    1    2     F     F     F
3    1    2     4     4     4

      

Therefore, it changes based on the column name and not on the position. I could do it very naively like the following.

c = test.columns
test.columns = range(len(test.columns))
test.iloc[:, 3] = test.iloc[:, 3].fillna('F')
test.columns = c

      

which gives the correct result

In [142]: test
Out[142]:
   One  Two  Three  Three  Three
0    1    2      3      F    NaN
1    1    2      3      4    5.0
2    1    2      3      F    NaN
3    1    2      3      4    NaN

      

but seems a little inefficient considering the simple task.

My question then is twofold.

  • Will there be an easier method?
  • Why doesn't the first one work? (why is iloc

    it still resorting to names when replacing columns?)
+3


source to share


2 answers


The answer to the second question about why the first method doesn't work can be caused by the way Pandas handles duplicate columns. While the constructor for DataFrame

has no customization to do this, the read_csv

documentation does have a parameter mangle_dupe_cols

, the default is True. The documentation says that passing to False can overwrite data. I suspect Pandas handles duplicate columns in a questionable way.



+1


source


' iloc

indexes the dataframe object while fillna

looking for a series, so it won't let you apply fillna

.

Easier is to simply replace NaN values ​​after indexing them:

test.iloc[:,1][test.iloc[:,1].isnull()] = 'F'

      



Or, alternatively (and closer to your original code), actually select the column:

test.iloc[:, 3]['Three] = test.iloc[:, 3]['Three'].fillna('F')

      

0


source







All Articles