Pandas, Python: Rotate some (31 day) dataframe columns and map them to existing (year, month) rows (NOAA data)

I have NOAA weather data. In it raw state it has year and month as rows and then days as columns. I want to expand the number of rows so that each row has a year, month and day with corresponding data in each row.

There is also a weather variable column, where each row represents a weather variable collected each month. The number of weather variables collected per month may change. (In January there are two (tmax, tmin), in February there are three (tmax, tmin, prcp), and in March there is one (tmin).)

Here is an example df.

example_df = pd.DataFrame({'station': ['USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1'],
           'year': [1993, 1993, 1993, 1993,1993, 1993],
           'month': [1, 1,  2, 2, 2, 3],
           'attribute':['tmax', 'tmin', 'tmax', 'tmin', 'prcp', 'tmax'],
           'day1': range(1, 7, 1),
           'day2': range(1, 7, 1),
           'day3': range(1, 7, 1),
           'day4': range(1, 7, 1),
                  })
example_df = example_df[['station', 'year', 'month', 'attribute', 'day1', 'day2', 'day3', 'day4']]

      

This is the solution I want,

solution_df = pd.DataFrame({'station': ['USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1','USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1'],
           'year': [1993, 1993, 1993, 1993,1993, 1993, 1993, 1993, 1993, 1993,1993, 1993],
           'month': [1, 1,1, 1, 2, 2,  2, 2, 3, 3, 3, 3],
           'day':[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
           'tmax': [1, 1, 1, 1, 3, 3, 3, 3, 6, 6, 6, 6],
           'tmin': [2, 2, 2, 2, 4, 4, 4, 4, np.nan, np.nan, np.nan, np.nan],
           'prcp': [np.nan, np.nan, np.nan, np.nan, 5, 5, 5, 5, np.nan, np.nan, np.nan, np.nan]

                  })
solution_df = solution_df[['station', 'year', 'month', 'day', 'tmax', 'tmin', 'prcp']]

      

I tried .T, pivot, melt, stack and unstack so that the columns of the day are rows with the correct months.

This is as close as I got success with the example dataset.

record_arr = example_df.to_records()

new_df = pd.DataFrame({'station': np.nan,
                  'year': np.nan,
                  'month':np.nan, 
                  'day': np.nan,
                  'tmax':np.nan,
                  'tmin': np.nan,
                  'prcp':np.nan},
                   index = [1]
                 )
new_df.append ({'station': record_arr[0][1], 'year': record_arr[0][2], 'month':record_arr[0][3], 'tmax':record_arr[0][5], 'tmin':record_arr[1][5] }, ignore_index = True)

      

+3


source to share


1 answer


This requires a vault as well as a melt (or screed and stack). This is how I got it in two steps



df1 = example_df.set_index(['station', 'year', 'month', 'attribute']).stack().reset_index()
df1.set_index(['station', 'year', 'month', 'level_4','attribute'])[0].unstack().reset_index()


attribute   station year    month   level_4 prcp    tmax    tmin
0           USC1    1993    1       day1    NaN     1.0     2.0
1           USC1    1993    1       day2    NaN     1.0     2.0
2           USC1    1993    1       day3    NaN     1.0     2.0
3           USC1    1993    1       day4    NaN     1.0     2.0
4           USC1    1993    2       day1    5.0     3.0     4.0
5           USC1    1993    2       day2    5.0     3.0     4.0
6           USC1    1993    2       day3    5.0     3.0     4.0
7           USC1    1993    2       day4    5.0     3.0     4.0
8           USC1    1993    3       day1    NaN     6.0     NaN
9           USC1    1993    3       day2    NaN     6.0     NaN
10          USC1    1993    3       day3    NaN     6.0     NaN
11          USC1    1993    3       day4    NaN     6.0     NaN

      

+2


source







All Articles