Pandas, Python: Rotate some (31 day) dataframe columns and map them to existing (year, month) rows (NOAA data)
I have NOAA weather data. In it raw state it has year and month as rows and then days as columns. I want to expand the number of rows so that each row has a year, month and day with corresponding data in each row.
There is also a weather variable column, where each row represents a weather variable collected each month. The number of weather variables collected per month may change. (In January there are two (tmax, tmin), in February there are three (tmax, tmin, prcp), and in March there is one (tmin).)
Here is an example df.
example_df = pd.DataFrame({'station': ['USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1'],
'year': [1993, 1993, 1993, 1993,1993, 1993],
'month': [1, 1, 2, 2, 2, 3],
'attribute':['tmax', 'tmin', 'tmax', 'tmin', 'prcp', 'tmax'],
'day1': range(1, 7, 1),
'day2': range(1, 7, 1),
'day3': range(1, 7, 1),
'day4': range(1, 7, 1),
})
example_df = example_df[['station', 'year', 'month', 'attribute', 'day1', 'day2', 'day3', 'day4']]
This is the solution I want,
solution_df = pd.DataFrame({'station': ['USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1','USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1'],
'year': [1993, 1993, 1993, 1993,1993, 1993, 1993, 1993, 1993, 1993,1993, 1993],
'month': [1, 1,1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'day':[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
'tmax': [1, 1, 1, 1, 3, 3, 3, 3, 6, 6, 6, 6],
'tmin': [2, 2, 2, 2, 4, 4, 4, 4, np.nan, np.nan, np.nan, np.nan],
'prcp': [np.nan, np.nan, np.nan, np.nan, 5, 5, 5, 5, np.nan, np.nan, np.nan, np.nan]
})
solution_df = solution_df[['station', 'year', 'month', 'day', 'tmax', 'tmin', 'prcp']]
I tried .T, pivot, melt, stack and unstack so that the columns of the day are rows with the correct months.
This is as close as I got success with the example dataset.
record_arr = example_df.to_records()
new_df = pd.DataFrame({'station': np.nan,
'year': np.nan,
'month':np.nan,
'day': np.nan,
'tmax':np.nan,
'tmin': np.nan,
'prcp':np.nan},
index = [1]
)
new_df.append ({'station': record_arr[0][1], 'year': record_arr[0][2], 'month':record_arr[0][3], 'tmax':record_arr[0][5], 'tmin':record_arr[1][5] }, ignore_index = True)
source to share
This requires a vault as well as a melt (or screed and stack). This is how I got it in two steps
df1 = example_df.set_index(['station', 'year', 'month', 'attribute']).stack().reset_index()
df1.set_index(['station', 'year', 'month', 'level_4','attribute'])[0].unstack().reset_index()
attribute station year month level_4 prcp tmax tmin
0 USC1 1993 1 day1 NaN 1.0 2.0
1 USC1 1993 1 day2 NaN 1.0 2.0
2 USC1 1993 1 day3 NaN 1.0 2.0
3 USC1 1993 1 day4 NaN 1.0 2.0
4 USC1 1993 2 day1 5.0 3.0 4.0
5 USC1 1993 2 day2 5.0 3.0 4.0
6 USC1 1993 2 day3 5.0 3.0 4.0
7 USC1 1993 2 day4 5.0 3.0 4.0
8 USC1 1993 3 day1 NaN 6.0 NaN
9 USC1 1993 3 day2 NaN 6.0 NaN
10 USC1 1993 3 day3 NaN 6.0 NaN
11 USC1 1993 3 day4 NaN 6.0 NaN
source to share