Level in pandas concat

df1 = DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b', 'c'],
        columns=['one', 'two'])
df2 = DataFrame(5 + np.arange(4).reshape(2, 2), index=['a', 'c'],
        columns=['three', 'four'])

>>> df1
   one  two
a    0    1
b    2    3
c    4    5

>>> df2
   three  four
a      5     6
c      7     8


res = pd.concat([df1, df2], axis=1, levels=['level1', 'level2'],
        names=['upper', 'lower'])
>>> res
   one  two  three  four
a    0    1      5     6
b    2    3    NaN   NaN
c    4    5      7     8

      

My question is, why are the levels and names not showing up in the res output above? Any real world example of using the level parameter?

Thanks for your time and help.

+3


source to share


1 answer


A really interesting question.

I've been doing SO research but never used :(

But in the docs this is one example with notification:

Yes, this is quite esoteric, but in fact it is necessary to implement things like GroupBy

where the meaning of the categorical variable makes sense.

Also docs

says:



levels : sequence list, default None. Specific levels (unique values) to use to build the MultiIndex. Otherwise, they will be derived from the keys.

Therefore, he adds new levels to MultiIndex

:

res = pd.concat([df1, df2], axis=1,
                keys=['level1','level2'], 
                levels=[['level1', 'level2','level3']], 
                names=['upper', 'lower'])

print (res)
upper level1     level2     
lower    one two  three four
a          0   1    5.0  6.0
b          2   3    NaN  NaN
c          4   5    7.0  8.0

print (res.columns)
MultiIndex(levels=[['level1', 'level2', 'level3'], ['four', 'one', 'three', 'two']],
           labels=[[0, 0, 1, 1], [1, 3, 2, 0]],
           names=['upper', 'lower'])

      

Without parameters levels

:

res = pd.concat([df1, df2], axis=1,
                keys=['level1','level2'], 
                names=['upper', 'lower'])

print (res)
upper level1     level2     
lower    one two  three four
a          0   1    5.0  6.0
b          2   3    NaN  NaN
c          4   5    7.0  8.0

print (res.columns)
MultiIndex(levels=[['level1', 'level2'], ['four', 'one', 'three', 'two']],
           labels=[[0, 0, 1, 1], [1, 3, 2, 0]],
           names=['upper', 'lower'])

      

+2


source







All Articles