Level in pandas concat
df1 = DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b', 'c'],
columns=['one', 'two'])
df2 = DataFrame(5 + np.arange(4).reshape(2, 2), index=['a', 'c'],
columns=['three', 'four'])
>>> df1
one two
a 0 1
b 2 3
c 4 5
>>> df2
three four
a 5 6
c 7 8
res = pd.concat([df1, df2], axis=1, levels=['level1', 'level2'],
names=['upper', 'lower'])
>>> res
one two three four
a 0 1 5 6
b 2 3 NaN NaN
c 4 5 7 8
My question is, why are the levels and names not showing up in the res output above? Any real world example of using the level parameter?
Thanks for your time and help.
source to share
A really interesting question.
I've been doing SO research but never used :(
But in the docs this is one example with notification:
Yes, this is quite esoteric, but in fact it is necessary to implement things like
GroupBy
where the meaning of the categorical variable makes sense.
Also docs
says:
levels : sequence list, default None. Specific levels (unique values) to use to build the MultiIndex. Otherwise, they will be derived from the keys.
Therefore, he adds new levels to MultiIndex
:
res = pd.concat([df1, df2], axis=1,
keys=['level1','level2'],
levels=[['level1', 'level2','level3']],
names=['upper', 'lower'])
print (res)
upper level1 level2
lower one two three four
a 0 1 5.0 6.0
b 2 3 NaN NaN
c 4 5 7.0 8.0
print (res.columns)
MultiIndex(levels=[['level1', 'level2', 'level3'], ['four', 'one', 'three', 'two']],
labels=[[0, 0, 1, 1], [1, 3, 2, 0]],
names=['upper', 'lower'])
Without parameters levels
:
res = pd.concat([df1, df2], axis=1,
keys=['level1','level2'],
names=['upper', 'lower'])
print (res)
upper level1 level2
lower one two three four
a 0 1 5.0 6.0
b 2 3 NaN NaN
c 4 5 7.0 8.0
print (res.columns)
MultiIndex(levels=[['level1', 'level2'], ['four', 'one', 'three', 'two']],
labels=[[0, 0, 1, 1], [1, 3, 2, 0]],
names=['upper', 'lower'])
source to share