Pandas, Python: Rotate some (31 day) dataframe columns and map them to existing (year, month) rows (NOAA data)

Question

Pandas, Python: Rotate some (31 day) dataframe columns and map them to existing (year, month) rows (NOAA data)

I have NOAA weather data. In it raw state it has year and month as rows and then days as columns. I want to expand the number of rows so that each row has a year, month and day with corresponding data in each row.

There is also a weather variable column, where each row represents a weather variable collected each month. The number of weather variables collected per month may change. (In January there are two (tmax, tmin), in February there are three (tmax, tmin, prcp), and in March there is one (tmin).)

Here is an example df.

example_df = pd.DataFrame({'station': ['USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1'],
           'year': [1993, 1993, 1993, 1993,1993, 1993],
           'month': [1, 1,  2, 2, 2, 3],
           'attribute':['tmax', 'tmin', 'tmax', 'tmin', 'prcp', 'tmax'],
           'day1': range(1, 7, 1),
           'day2': range(1, 7, 1),
           'day3': range(1, 7, 1),
           'day4': range(1, 7, 1),
                  })
example_df = example_df[['station', 'year', 'month', 'attribute', 'day1', 'day2', 'day3', 'day4']]

This is the solution I want,

solution_df = pd.DataFrame({'station': ['USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1','USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1'],
           'year': [1993, 1993, 1993, 1993,1993, 1993, 1993, 1993, 1993, 1993,1993, 1993],
           'month': [1, 1,1, 1, 2, 2,  2, 2, 3, 3, 3, 3],
           'day':[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
           'tmax': [1, 1, 1, 1, 3, 3, 3, 3, 6, 6, 6, 6],
           'tmin': [2, 2, 2, 2, 4, 4, 4, 4, np.nan, np.nan, np.nan, np.nan],
           'prcp': [np.nan, np.nan, np.nan, np.nan, 5, 5, 5, 5, np.nan, np.nan, np.nan, np.nan]

                  })
solution_df = solution_df[['station', 'year', 'month', 'day', 'tmax', 'tmin', 'prcp']]

I tried .T, pivot, melt, stack and unstack so that the columns of the day are rows with the correct months.

This is as close as I got success with the example dataset.

record_arr = example_df.to_records()

new_df = pd.DataFrame({'station': np.nan,
                  'year': np.nan,
                  'month':np.nan, 
                  'day': np.nan,
                  'tmax':np.nan,
                  'tmin': np.nan,
                  'prcp':np.nan},
                   index = [1]
                 )
new_df.append ({'station': record_arr[0][1], 'year': record_arr[0][2], 'month':record_arr[0][3], 'tmax':record_arr[0][5], 'tmin':record_arr[1][5] }, ignore_index = True)

+3

python pandas

ma8 June 18 17 at 3:54

source to share

1 answer

Vaishali · Accepted Answer · 2017-06-18T04:12:05+0000

This requires a vault as well as a melt (or screed and stack). This is how I got it in two steps

df1 = example_df.set_index(['station', 'year', 'month', 'attribute']).stack().reset_index()
df1.set_index(['station', 'year', 'month', 'level_4','attribute'])[0].unstack().reset_index()


attribute   station year    month   level_4 prcp    tmax    tmin
0           USC1    1993    1       day1    NaN     1.0     2.0
1           USC1    1993    1       day2    NaN     1.0     2.0
2           USC1    1993    1       day3    NaN     1.0     2.0
3           USC1    1993    1       day4    NaN     1.0     2.0
4           USC1    1993    2       day1    5.0     3.0     4.0
5           USC1    1993    2       day2    5.0     3.0     4.0
6           USC1    1993    2       day3    5.0     3.0     4.0
7           USC1    1993    2       day4    5.0     3.0     4.0
8           USC1    1993    3       day1    NaN     6.0     NaN
9           USC1    1993    3       day2    NaN     6.0     NaN
10          USC1    1993    3       day3    NaN     6.0     NaN
11          USC1    1993    3       day4    NaN     6.0     NaN

Pandas, Python: Rotate some (31 day) dataframe columns and map them to existing (year, month) rows (NOAA data)

More articles: