Pandas multiIndex completely copied to slice of data chunk

I think there is a conceptual error in the way of creating a multi-index on a slice of a chunk of data. Consider the following code:

import cufflinks as cf
df.columns = MultiIndex.from_tuples([('Iter1','a'), ('Iter1','b'),
                                     ('Iter2','c'), ('Iter2','d'),
                                     ('Iter3','e'), ('Iter3','f')])


Create a simple multi-indexed columnar frame:

enter image description here

Slice this dataframe:

new_df = df[['Iter1','Iter2']].copy()


enter image description here

So it seems that the data is presented in order, but behind the scenes, the complete index still exists:

In [52]: new_df.columns
MultiIndex(levels=[[u'Iter1', u'Iter2', u'Iter3'], [u'a', u'b', u'c', u'd', u'e', u'f']],
           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])


This seems to be a bug to me, as now when you try to approach the last column in the sliced โ€‹โ€‹piece of data, it returns nothing:

In [54]:
last_col = new_df.columns.levels[0][-1]



I would like to pass a couple of multiple columns to my function, cutting off my original dataframe, but it seems to me that there is no way for me to access those columns programmatically.


source to share

1 answer

You need remove_unused_levels

what is new functionality in pandas 0.20.0

, you can also check the docs :




cols = pd.MultiIndex.from_tuples([('Iter1','a'), ('Iter1','b'),
                                     ('Iter2','c'), ('Iter2','d'),
                                     ('Iter3','e'), ('Iter3','f')])
idx = pd.date_range('2015-01-01', periods=5)
df = pd.DataFrame(np.random.rand(5,6), columns=cols, index=idx)
print (df)
               Iter1               Iter2               Iter3          
                   a         b         c         d         e         f
2015-01-01  0.517298  0.946963  0.765460  0.282396  0.221045  0.686222
2015-01-02  0.167139  0.392442  0.618052  0.411930  0.002465  0.884032
2015-01-03  0.884948  0.300410  0.589582  0.978427  0.845094  0.065075
2015-01-04  0.294744  0.287934  0.822466  0.626183  0.110478  0.000529
2015-01-05  0.942166  0.141501  0.421597  0.346489  0.869785  0.428602


new_df = df[['Iter1','Iter2']].copy()
print (new_df)
               Iter1               Iter2          
                   a         b         c         d
2015-01-01  0.517298  0.946963  0.765460  0.282396
2015-01-02  0.167139  0.392442  0.618052  0.411930
2015-01-03  0.884948  0.300410  0.589582  0.978427
2015-01-04  0.294744  0.287934  0.822466  0.626183
2015-01-05  0.942166  0.141501  0.421597  0.346489

print (new_df.columns)
MultiIndex(levels=[['Iter1', 'Iter2', 'Iter3'], ['a', 'b', 'c', 'd', 'e', 'f']],
           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])

print (new_df.columns.remove_unused_levels())
MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],
           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])

new_df.columns = new_df.columns.remove_unused_levels()

print (new_df.columns)
MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],
           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])




All Articles