How to get the correct formatted index in a pandas frame

Having a data frame like this:

>>> df = pd.DataFrame({'name': ['foo', 'foo', 'bar', 'bar'],
                   'colx': [1, 2, 3, 4],
                   'coly': [5, 6, 7, 8]})
>>> df.set_index('name', inplace=True)
>>> df
      colx  coly
name            
foo      1     5
foo      2     6
bar      3     7
bar      4     8

      

how can I get the correct formatted index like:

      colx  coly
name            
foo      1     5
         2     6
bar      3     7
         4     8

      

so that pandas doesn't complain about duplicate indexes.

+3


source to share


1 answer


One (among many) options would be to add a new index level:

In [49]: df = df.set_index(df.groupby(level=0).cumcount().add(1) \
                             .to_frame('num')['num'],
                           append=True)

In [50]: df
Out[50]:
          colx  coly
name num
foo  1       1     5
     2       2     6
bar  1       3     7
     2       4     8

      

UPDATE: Not to be confused with how Pandas shows duplicates in multi-indexes:

if we select all values ​​of the name

multi - index level , we will still see duplicates:



In [51]: df.index.get_level_values(0)
Out[51]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object', name='name')

      

This is just the way Pandas presents duplicates in a multi-index. We can turn off this display option:

In [53]: pd.options.display.multi_sparse = False

In [54]: df
Out[54]:
          colx  coly
name num
foo  1       1     5
foo  2       2     6
bar  1       3     7
bar  2       4     8

In [55]: pd.options.display.multi_sparse = True

In [56]: df
Out[56]:
          colx  coly
name num
foo  1       1     5
     2       2     6
bar  1       3     7
     2       4     8

      

PS this option does not change the index values ​​and only affects the view for multi -indices

+2


source







All Articles