What is it called: melting? Swivel? Restructuring?
This is a question about using pandas and ggplot in Python, but R's answer is also very welcome.
I am trying to plot some temporary data that looks something like below. X, Y, Z are well-plate identifiers (experiment names) and 0,1,2 are different times. I want to be able to display the time rate as a line chart both in different panels / subheadings / faces and in the same aspect, but with different colors.
X Y Z
0 0.1 0.2 0.3
1 1.1 1.2 1.3
2 2.1 2.2 2.3
I know that pandas combined with ggplot would let me say
from ggplot import *
ggplot(aes(x='T', y='value', color='well'), data = df) + geom_line()
or
from ggplot import *
ggplot(aes(x='T', y='value'), data = df) + geom_line() + facet_grid(x='well')
if df looks like this:
well T value
X 0 0.1
X 1 1.1
X 2 2.1
Y 0 0.2
Y 1 1.2
Y 2 2.2
Z 0 0.3
Z 1 1.3
Z 2 2.3
and the lines are not necessarily in that order.
My question is, how do I make the dataframe above into the dataframe below the code, and what is this operation called? Again, I'm mainly interested in pandas and ggplot in Python, but answering the question of how this can be done in R will also be very helpful.
I would also appreciate it if someone can recommend a good source to learn about such things with data.
source to share
It is called transforming the data frame using techniques such as pivoting or melting, and can include stack and stack, pivot_table, and various other methods.
To go from "wide" to "long"
print(df)
X Y Z
0 0.1 0.2 0.3
1 1.1 1.2 1.3
2 2.1 2.2 2.3
You can convert into a long data frame with methods such as stack
and reset_index
:
df2 = df.stack().reset_index()
df2.columns = ['T','well','value']
print(df2)
T well value
0 0 X 0.1
1 0 Y 0.2
2 0 Z 0.3
3 1 X 1.1
4 1 Y 1.2
5 1 Z 1.3
6 2 X 2.1
7 2 Y 2.2
8 2 Z 2.3
Or using melt
:
df.reset_index().rename(columns={'index':'T'}).melt(id_vars='T').sort_values(by='T')
T well value
0 0 X 0.1
3 0 Y 0.2
6 0 Z 0.3
1 1 X 1.1
4 1 Y 1.2
7 1 Z 1.3
2 2 X 2.1
5 2 Y 2.2
8 2 Z 2.3
To go from 'long' to 'wide'
print(df2)
T well value
0 0 X 0.1
3 0 Y 0.2
6 0 Z 0.3
1 1 X 1.1
4 1 Y 1.2
7 1 Z 1.3
2 2 X 2.1
5 2 Y 2.2
8 2 Z 2.3
Usage pivot
:
df2.pivot(index='T',columns='well')
value
well X Y Z
T
0 0.1 0.2 0.3
1 1.1 1.2 1.3
2 2.1 2.2 2.3
Using set_index
and unstack
:
df2.set_index(['T','well']).unstack()
value
well X Y Z
T
0 0.1 0.2 0.3
1 1.1 1.2 1.3
2 2.1 2.2 2.3
Usage pivot_table
:
pd.pivot_table(df2,aggfunc='mean',index='T',columns='well')
value
well X Y Z
T
0 0.1 0.2 0.3
1 1.1 1.2 1.3
2 2.1 2.2 2.3
source to share