How to get the correct formatted index in a pandas frame
Having a data frame like this:
>>> df = pd.DataFrame({'name': ['foo', 'foo', 'bar', 'bar'],
'colx': [1, 2, 3, 4],
'coly': [5, 6, 7, 8]})
>>> df.set_index('name', inplace=True)
>>> df
colx coly
name
foo 1 5
foo 2 6
bar 3 7
bar 4 8
how can I get the correct formatted index like:
colx coly
name
foo 1 5
2 6
bar 3 7
4 8
so that pandas doesn't complain about duplicate indexes.
+3
source to share
1 answer
One (among many) options would be to add a new index level:
In [49]: df = df.set_index(df.groupby(level=0).cumcount().add(1) \
.to_frame('num')['num'],
append=True)
In [50]: df
Out[50]:
colx coly
name num
foo 1 1 5
2 2 6
bar 1 3 7
2 4 8
UPDATE: Not to be confused with how Pandas shows duplicates in multi-indexes:
if we select all values ββof the name
multi - index level , we will still see duplicates:
In [51]: df.index.get_level_values(0)
Out[51]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object', name='name')
This is just the way Pandas presents duplicates in a multi-index. We can turn off this display option:
In [53]: pd.options.display.multi_sparse = False
In [54]: df
Out[54]:
colx coly
name num
foo 1 1 5
foo 2 2 6
bar 1 3 7
bar 2 4 8
In [55]: pd.options.display.multi_sparse = True
In [56]: df
Out[56]:
colx coly
name num
foo 1 1 5
2 2 6
bar 1 3 7
2 4 8
PS this option does not change the index values ββand only affects the view for multi -indices
+2
source to share