Change data format using melt, stack and multi-index?
I'm new to Python and I have a dataframe that requires a bit of a tricky rebuild. This best describes an example using dummy data:
I have it:
Source data framework:
testdata = [('State', ['CA', 'FL', 'ON']),
('Country', ['US', 'US', 'CAN']),
('a1', [0.059485629, 0.968962817, 0.645435903]),
('b2', [0.336665658, 0.404398227, 0.333113735]),
('Test', ['Test1', 'Test2', 'Test3']),
('d', [20, 18, 24]),
('e', [21, 16, 25]),
]
df = pd.DataFrame.from_items(testdata)
I am the core of the data:
testdata2 = [('State', ['CA', 'CA', 'FL', 'FL', 'ON', 'ON']),
('Country', ['US', 'US', 'US', 'US', 'CAN', 'CAN']),
('Test', ['Test1', 'Test1', 'Test2', 'Test2', 'Test3', 'Test3']),
('Measurements', ['a1', 'b2', 'a1', 'b2', 'a1', 'b2']),
('Values', [0.059485629, 0.336665658, 0.968962817, 0.404398227, 0.645435903, 0.333113735]),
('Steps', [20, 21, 18, 16, 24, 25]),
]
dfn = pd.DataFrame.from_items(testdata2)
It looks like the solution probably requires the use of melt, stack and multi-index, but I'm not sure how to put all these files together.
Any suggested solutions are greatly appreciated.
Thank.
+3
source to share
2 answers
Try:
df1 = df.melt(id_vars=['State','Country','Test'],value_vars=['a1','b2'],value_name='Values',var_name='Measuremensts')
df2 = df.melt(id_vars=['State','Country','Test'],value_vars=['d','e'],value_name='Steps').drop('variable',axis=1)
df1.merge(df2, on=['State','Country','Test'], right_index=True, left_index=True)
Output:
State Country Test Measuremensts Values Steps
0 CA US Test1 a1 0.059486 20
1 FL US Test2 a1 0.968963 18
2 ON CAN Test3 a1 0.645436 24
3 CA US Test1 b2 0.336666 21
4 FL US Test2 b2 0.404398 16
5 ON CAN Test3 b2 0.333114 25
Or use @JohnGalt's solution:
pd.concat([pd.melt(df, id_vars=['State', 'Country', 'Test'], value_vars=x) for x in [['d', 'e'], ['a1', 'b2']]], axis=1)
+2
source to share
There is a way to do this with pd.wide_to_long
, but you must rename your columns so that the column Measurements
contains the correct values
df1 = df.rename(columns={'a1':'Values_a1', 'b2':'Values_b2', 'd':'Steps_a1', 'e':'Steps_b2'})
pd.wide_to_long(df1,
stubnames=['Values', 'Steps'],
i=['State', 'Country', 'Test'],
j='Measurements',
sep='_',
suffix='.').reset_index()
State Country Test Measurements Values Steps
0 CA US Test1 a1 0.059486 20
1 CA US Test1 b2 0.336666 21
2 FL US Test2 a1 0.968963 18
3 FL US Test2 b2 0.404398 16
4 ON CAN Test3 a1 0.645436 24
5 ON CAN Test3 b2 0.333114 25
+1
source to share