Pandas dataFrame group for user defined span of months

Question

Pandas dataFrame group for user defined span of months

What is the best approach to group data for the winter seasons starting October through April? With evenly spaced frequencies TimeGrouper

I don't get it to output seasonal sums of winter months from seasons 1972/1973, 1973/1974, etc. Maybe a trivial thing, but I don't know how to do it without getting started writing an overflow solution.

                 sd_x       sd_y
1972-10-31   0.000000   0.709677
1972-11-30   1.720838   4.366667
1972-12-31  15.893438   5.600000
1973-01-31   6.256230   6.548387
1973-02-28   0.653714  53.142857
1973-03-31   0.000000  70.354839
1973-04-30   0.000000  11.700000
1973-10-31   0.000000   0.096774
1973-11-30   0.000000   4.266667
1973-12-31   0.394652  53.419355
1974-01-31   4.540915  46.645161
1974-02-28   2.978056  35.571429
1974-03-31   0.000000   4.967742
1974-04-30   0.000000   0.000000
1974-10-31   0.000000   0.064516
1974-11-30   0.000000   1.000000
1974-12-31   5.585954  20.096774
1975-01-31  50.498147  24.580645
1975-02-28  35.906097  22.000000
1975-03-31   0.457109   5.483871
1975-04-30   0.000000   0.433333

+3

python pandas group-by pandas-groupby

Manuel Apr 20 17 at 22:38

source to share

2 answers

In [94]: df.groupby((df.index - pd.DateOffset(months=4)).year).sum()
Out[94]:
           sd_x        sd_y
1972  24.524220  152.422427
1973   7.913623  144.967128
1974  92.447307   73.659139

+3

MaxU Apr 20 17 at 22:49

source to share

piRSquared · Accepted Answer · 2017-04-20T22:48:12+0000

Use pd.offsets.MonthBegin

to translate months back to4

shifted_months = df.index - pd.offsets.MonthBegin(5)
shifted_months

DatetimeIndex(['1972-06-01', '1972-07-01', '1972-08-01', '1972-09-01',
               '1972-10-01', '1972-11-01', '1972-12-01', '1973-06-01',
               '1973-07-01', '1973-08-01', '1973-09-01', '1973-10-01',
               '1973-11-01', '1973-12-01', '1974-06-01', '1974-07-01',
               '1974-08-01', '1974-09-01', '1974-10-01', '1974-11-01',
               '1974-12-01'],
              dtype='datetime64[ns]', freq=None)

Then we can use the attribute .year

for groupby

andsum

df.groupby(shifted_months.year).sum()

           sd_x        sd_y
1972  24.524220  152.422427
1973   7.913623  144.967128
1974  92.447307   73.659139

We can index the indices pretty well with

df.groupby(shifted_months.year).sum().rename(lambda x: '{}/{}'.format(x, x + 1))

                sd_x        sd_y
1972/1973  24.524220  152.422427
1973/1974   7.913623  144.967128
1974/1975  92.447307   73.659139

Pandas dataFrame group for user defined span of months

More articles: