Change data format using melt, stack and multi-index?

I'm new to Python and I have a dataframe that requires a bit of a tricky rebuild. This best describes an example using dummy data:

I have it:

enter image description here

and I need this: enter image description here

Source data framework:

testdata = [('State', ['CA', 'FL', 'ON']),
     ('Country', ['US', 'US', 'CAN']),
     ('a1', [0.059485629, 0.968962817, 0.645435903]),
     ('b2', [0.336665658, 0.404398227, 0.333113735]),
     ('Test', ['Test1', 'Test2', 'Test3']),
     ('d', [20, 18, 24]),
     ('e', [21, 16, 25]),
     ]
df = pd.DataFrame.from_items(testdata)

      

I am the core of the data:

testdata2 = [('State', ['CA', 'CA',  'FL', 'FL', 'ON', 'ON']),
     ('Country', ['US', 'US', 'US', 'US', 'CAN', 'CAN']),
     ('Test', ['Test1', 'Test1', 'Test2', 'Test2',  'Test3', 'Test3']),
     ('Measurements', ['a1', 'b2', 'a1', 'b2',  'a1', 'b2']),
     ('Values', [0.059485629, 0.336665658,  0.968962817, 0.404398227, 0.645435903, 0.333113735]),
     ('Steps', [20,  21, 18,  16, 24, 25]),
     ]
dfn = pd.DataFrame.from_items(testdata2)

      

It looks like the solution probably requires the use of melt, stack and multi-index, but I'm not sure how to put all these files together.

Any suggested solutions are greatly appreciated.

Thank.

+3


source to share


2 answers


Try:

df1 = df.melt(id_vars=['State','Country','Test'],value_vars=['a1','b2'],value_name='Values',var_name='Measuremensts')
df2 = df.melt(id_vars=['State','Country','Test'],value_vars=['d','e'],value_name='Steps').drop('variable',axis=1)
df1.merge(df2, on=['State','Country','Test'], right_index=True, left_index=True)

      

Output:



  State Country   Test Measuremensts    Values  Steps
0    CA      US  Test1            a1  0.059486     20
1    FL      US  Test2            a1  0.968963     18
2    ON     CAN  Test3            a1  0.645436     24
3    CA      US  Test1            b2  0.336666     21
4    FL      US  Test2            b2  0.404398     16
5    ON     CAN  Test3            b2  0.333114     25

      

Or use @JohnGalt's solution:

pd.concat([pd.melt(df, id_vars=['State', 'Country', 'Test'], value_vars=x) for x in [['d', 'e'], ['a1', 'b2']]], axis=1)

      

+2


source


There is a way to do this with pd.wide_to_long

, but you must rename your columns so that the column Measurements

contains the correct values



df1 = df.rename(columns={'a1':'Values_a1', 'b2':'Values_b2', 'd':'Steps_a1', 'e':'Steps_b2'})
pd.wide_to_long(df1, 
                stubnames=['Values', 'Steps'], 
                i=['State', 'Country', 'Test'], 
                j='Measurements', 
                sep='_', 
                suffix='.').reset_index()

  State Country   Test Measurements    Values  Steps
0    CA      US  Test1           a1  0.059486     20
1    CA      US  Test1           b2  0.336666     21
2    FL      US  Test2           a1  0.968963     18
3    FL      US  Test2           b2  0.404398     16
4    ON     CAN  Test3           a1  0.645436     24
5    ON     CAN  Test3           b2  0.333114     25

      

+1


source







All Articles