What is it called: melting? Swivel? Restructuring?

This is a question about using pandas and ggplot in Python, but R's answer is also very welcome.

I am trying to plot some temporary data that looks something like below. X, Y, Z are well-plate identifiers (experiment names) and 0,1,2 are different times. I want to be able to display the time rate as a line chart both in different panels / subheadings / faces and in the same aspect, but with different colors.

   X    Y    Z
0  0.1  0.2  0.3
1  1.1  1.2  1.3
2  2.1  2.2  2.3

      

I know that pandas combined with ggplot would let me say

from ggplot import *
ggplot(aes(x='T', y='value', color='well'), data = df) + geom_line()

      

or

from ggplot import *
ggplot(aes(x='T', y='value'), data = df) + geom_line() + facet_grid(x='well')

      

if df looks like this:

well  T   value
X     0   0.1
X     1   1.1
X     2   2.1
Y     0   0.2
Y     1   1.2
Y     2   2.2
Z     0   0.3
Z     1   1.3
Z     2   2.3  

      

and the lines are not necessarily in that order.

My question is, how do I make the dataframe above into the dataframe below the code, and what is this operation called? Again, I'm mainly interested in pandas and ggplot in Python, but answering the question of how this can be done in R will also be very helpful.

I would also appreciate it if someone can recommend a good source to learn about such things with data.

+3


source to share


2 answers


It is called transforming the data frame using techniques such as pivoting or melting, and can include stack and stack, pivot_table, and various other methods.

To go from "wide" to "long"

print(df)

     X    Y    Z
0  0.1  0.2  0.3
1  1.1  1.2  1.3
2  2.1  2.2  2.3

      

You can convert into a long data frame with methods such as stack

and reset_index

:

df2 = df.stack().reset_index()
df2.columns = ['T','well','value']
print(df2)

   T well  value
0  0    X    0.1
1  0    Y    0.2
2  0    Z    0.3
3  1    X    1.1
4  1    Y    1.2
5  1    Z    1.3
6  2    X    2.1
7  2    Y    2.2
8  2    Z    2.3

      

Or using melt

:

df.reset_index().rename(columns={'index':'T'}).melt(id_vars='T').sort_values(by='T')

   T well  value
0  0    X    0.1
3  0    Y    0.2
6  0    Z    0.3
1  1    X    1.1
4  1    Y    1.2
7  1    Z    1.3
2  2    X    2.1
5  2    Y    2.2
8  2    Z    2.3

      

To go from 'long' to 'wide'



print(df2)

T well  value
0  0    X    0.1
3  0    Y    0.2
6  0    Z    0.3
1  1    X    1.1
4  1    Y    1.2
7  1    Z    1.3
2  2    X    2.1
5  2    Y    2.2
8  2    Z    2.3

      

Usage pivot

:

df2.pivot(index='T',columns='well')

     value          
well     X    Y    Z
T                   
0      0.1  0.2  0.3
1      1.1  1.2  1.3
2      2.1  2.2  2.3

      

Using set_index

and unstack

:

df2.set_index(['T','well']).unstack()

     value          
well     X    Y    Z
T                   
0      0.1  0.2  0.3
1      1.1  1.2  1.3
2      2.1  2.2  2.3

      

Usage pivot_table

:

pd.pivot_table(df2,aggfunc='mean',index='T',columns='well')

     value          
well     X    Y    Z
T                   
0      0.1  0.2  0.3
1      1.1  1.2  1.3
2      2.1  2.2  2.3

      

+2


source


try it



df1 = df.T.stack().reset_index().rename(columns = {'level_0': 'well', 'level_1': 'T', 0: 'value'})


    well    T   value
0   X       0   0.1
1   X       1   1.1
2   X       2   2.1
3   Y       0   0.2
4   Y       1   1.2
5   Y       2   2.2
6   Z       0   0.3
7   Z       1   1.3
8   Z       2   2.3

      

+1


source







All Articles