Converting pandas data from wide and long
I have pandas.Dataframe
with the following columns:
a_1 ab_1 ac_1 a_2 ab_2 ac_2
2 3 4 5 6 7
How do I convert it to the following?
a ab ac
2 3 4
5 6 7
I tried to use pandas melt to convert from wide to long format, but not sure about the syntax.
source to share
You can use split
for MultiIndex
and then change stack
and last use reset_index
for delete MultiIndex
:
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index(drop=True)
print (df)
a ab ac
0 2 3 4
1 5 6 7
df = df.stack().reset_index(level=0, drop=True)
print (df)
a ab ac
1 2 3 4
2 5 6 7
source to share
Here's one way to do it:
Code:
df.columns = pd.MultiIndex.from_tuples(
[c.split('_') for c in df.columns], names=['col', 'row'])
df.melt().pivot(index='row', columns='col', values='value')
How?
-
Create
pandas.MultiIndex
for columns by dividing by_
. -
melt
the data frame and thenpivot
in elements from the original column names.
Security Code:
df = pd.DataFrame(
data=[range(2, 8)],
columns='a_1 ab_1 ac_1 a_2 ab_2 ac_2'.split()
)
print(df)
df.columns = pd.MultiIndex.from_tuples(
[c.split('_') for c in df.columns], names=['col', 'row'])
print(df.melt().pivot(index='row', columns='col', values='value'))
Results:
a_1 ab_1 ac_1 a_2 ab_2 ac_2
0 2 3 4 5 6 7
col a ab ac
row
1 2 3 4
2 5 6 7
pandas <0.20.0
If using pandas before 0.20.0 melt()
like:
print(pd.melt(df).pivot(index='row', columns='col', values='value'))
source to share
If you want to use pnd.melt
, you should probably use parameters value_vars
and value_name
:
df_a = pnd.melt(df, value_vars=['a_1', 'a_2'], value_name='a')[['a']]
df_ab = pnd.melt(df, value_vars=['ab_1', 'ab_2'], value_name='ab')[['ab']]
df_ac = pnd.melt(df, value_vars=['ac_1', 'ac_2'], value_name='ac')[['ac']]
df_final = df_a.join(df_b).join(df_c)
Alternatively, using a more functional approach:
col_prefixes = ['a', 'ab', 'ac']
df_cuts = map(lambda x: pnd.melt(df, value_vars=['%s_1' % x, '%s_2' % x], value_name=x)[[x]], col_prefixes)
df_final = reduce(lambda x, y: x.join(y), df_cuts)
source to share
There is a built-in function wide_to_long for more details in the documentation:
In [115]: df
Out[115]:
a_1 ab_1 ac_1 a_2 ab_2 ac_2
0 2 3 4 5 6 7
In [116]: df['id'] = df.index
In [117]: df
Out[117]:
a_1 ab_1 ac_1 a_2 ab_2 ac_2 id
0 2 3 4 5 6 7 0
In [118]: pd.wide_to_long(df, ['a','ab','ac'],i='id',j='num',sep='_')
Out[118]:
a ab ac
id num
0 1 2 3 4
2 5 6 7
source to share