Splitting a column in Python Pandas dataframe
How do I split a column in a pandas DataFrame into variable names in the column? I have a DataFrame below:
ID FEATURE PARAM VALUE
0 A101 U1 ITEM1 10
1 A101 U1 ITEM2 11
2 A101 U2 ITEM1 12
3 A101 U2 ITEM2 13
4 A102 U1 ITEM1 14
5 A102 U1 ITEM2 15
6 A102 U2 ITEM1 16
7 A102 U2 ITEM2 17
I want to split it like below.
ID FEATURE ITEM1 ITEM2
0 A101 U1 10 11
1 A101 U2 12 13
2 A102 U1 14 15
3 A102 U2 16 17
I tried using one of the answers and it works great, but partially.
Select_Data.groupby('PARAM')['VALUE'].apply(list).apply(pd.Series).T
PARAM ITEM1 ITEM2
0 10 11
1 12 13
2 14 15
3 16 17
But I have lost the ID and FEATURE columns and I want to store them in the table. I will greatly appreciate any suggestions.
source to share
You can also use pivot_table
with index ID,FEATURE
and then reset index ie
ndf = pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE']).reset_index()
If you want to concatenate duplicate values you can use mean
ndf = pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE'],aggfunc='mean').reset_index()
Output:
PARAM ID FEATURE ITEM1 ITEM2 0 A101 U1 10 11 1 A101 U2 12 13 2 A102 U1 14 15 3 A102 U2 16 17 In [528]:
source to share
You can use set_index
and unstack
:
df = df.set_index(['ID','FEATURE','PARAM'])['VALUE']
.unstack()
.reset_index()
.rename_axis(None, axis=1)
print (df)
ID FEATURE ITEM1 ITEM2
0 A101 U1 10 11
1 A101 U2 12 13
2 A102 U1 14 15
3 A102 U2 16 17
but if get:
ValueError: Index contains duplicate entries, cannot change shape
then use Bharath shetty's solution or groupby
and aggregate mean
as duplicates in triplets ID,FEATURE,PARAM
:
print (df)
ID FEATURE PARAM VALUE
0 A101 U2 ITEM1 50<-same A101,U2,ITEM1
1 A101 U1 ITEM2 11
2 A101 U2 ITEM1 12<-same A101,U2,ITEM1
3 A101 U2 ITEM2 13
4 A102 U1 ITEM1 14
5 A102 U1 ITEM2 15
6 A102 U2 ITEM1 16
7 A102 U2 ITEM2 17
df = df.groupby(['ID','FEATURE','PARAM'])['VALUE'].mean()
.unstack().reset_index().rename_axis(None, axis=1)
print (df)
ID FEATURE ITEM1 ITEM2
0 A101 U1 NaN 11.0
1 A101 U2 31.0 13.0<-(50+12)/2=31
2 A102 U1 14.0 15.0
3 A102 U2 16.0 17.0
source to share