Rename duplicate pandas DataFrame index values

I have a DataFrame that contains some duplicate index values:

df1 =  pd.DataFrame( np.random.randn(6,6), columns = pd.date_range('1/1/2010', periods=6), index = {"A", "B", "C", "D", "E", "F"})
df1.rename(index = {"C": "A", "B": "E"}, inplace = 1)

ipdb> df1
      2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
 A   -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
 A   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
 E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
 F    1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
 D    0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
 E   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

      

I would like to change only the name of the duplicated values ​​and get the DataFrame like below:

ipdb> df1
     2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
A   -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
A_dp   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
F    1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
D    0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
E_dp   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

      

My approach:

(i) Create a dictionary with new names

old_names = df1[df1.index.duplicated()].index.values
new_names = df1[df1.index.duplicated()].index.values + "_dp"
dictionary = dict(zip(old_names, new_names))

      

(ii) Rename only duplicate values

df1.loc[df1.index.duplicated(),:].rename(index = dictionary, inplace = True)

      

However, this doesn't work.

+4


source to share


3 answers


You can use Index.where

:

df1.index = df1.index.where(~df1.index.duplicated(), df1.index + '_dp')
print (df1)
      2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
A      -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
A_dp   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
E      -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
F       1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
D       0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
E_dp   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

      



And if you need to remove the duplicate index into a unique one:

print (df1)
   2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
A   -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
A   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
F    1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
D    0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
E   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

df1.index = df1.index + df1.groupby(level=0).cumcount().astype(str).replace('0','')
print (df1)
    2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
A    -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
A1   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
E    -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
E1   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
E2   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
F     1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
D     0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
E3   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

      

+8


source


I used Jezrael's excellent answer in this rename function:

def rn(df, suffix = '-duplicate-'):
    appendents = (suffix + df.groupby(level=0).cumcount().astype(str).replace('0','')).replace(suffix, '')
    return df.set_index(df.index + appendents)

      

then this:



df = pd.DataFrame({'a':[1,2,3,4,5,6,7,8, 9]}, index=['a'+str(i) for i in [1,2,3,3,4,3,5,5, 6]])
rn(df)

      

spits it out:

    a
a1  1
a2  2
a3  3
a3-duplicate-1  4
a4  5
a3-duplicate-2  6
a5  7
a5-duplicate-1  8
a6  9

      

0


source


What if there is a multi-index and I want to change the second index that is duplicated?

Index1 index value2 A a 1 b 2 b 3 B a 1

to

Index1 value index2 A a 1 b0 2 b1 3 B a 1

0


source







All Articles