Group data by seasons according to exact dates

Question

Group data by seasons according to exact dates

I have a csv file containing 4 years of data and I am trying to group the data for a season over 4 years, in other words I need to sum and print all my data for only 4 seasons. here look at my data file:

timestamp,heure,lat,lon,impact,type
2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2008-01-01 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
2008-01-02 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
....
2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1

and here's my desired output:

winter     (the mean value of impacts)
summer     (the mean value of impacts)
autumn      ....
spring      .....

Actually I tried this code:

names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('flash.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'], dayfirst=True)

spring = range(80, 172)
summer = range(172, 264)
fall = range(264, 355)

def season(x):
    if x in spring:
       return 'Spring'
    if x in summer:
       return 'Summer'
    if x in fall:
       return 'Fall'
   else :
       return 'Winter'

 data['SEASON'] = data.index.to_series().dt.month.map(lambda x : season(x))
 data['impact'] = data['impact'].abs()
 seasonly = data.groupby('SEASON')['impact'].mean()

and I got this horrible result:

where am i wrong?

+3

python-2.7 pandas pandas-groupby

Marie antoinette June 13. 17 at 16:11

source to share

3 answers

pandas.cut

To properly handle 'Winter'

both the beginning and the end of the year, I moved dayofyear

to 11

and got the results modulo 366

. The reason I am not using the same method as in the solution below numpy

is because it pd.cut

returns a categorical type and I would get 5 categories in which two categories had the same label. I could then cast the result as a string, but that was messy.

data['SEASON'] = pd.cut(
    (data.index.dayofyear + 11) % 366,
    [0, 91, 183, 275, 366],
    labels=['Winter', 'Spring', 'Summer', 'Fall']
)

numpy.searchsorted

To handle properly 'Winter'

both at the beginning and end of the year, I allowed two bins for'Winter'

seasons = np.array(['Winter', 'Spring', 'Summer', 'Fall', 'Winter'])
f = np.searchsorted([80, 172, 264, 355], data.index.dayofyear)
data['SEASON'] = seasons[f]

plot

data.groupby('SEASON')['impact'].mean().plot.bar()

+3

piRSquared June 13. 17 at 16:39

source to share

Looks like:

data['SEASON'] = data.index.to_series().dt.**month**.map(lambda x : season(x))

uses a month, presumably 1-12 or 0-11, which is all "winter". You need to use the day of the year.

But you could probably see this more easily and made it possible to print it out to test it yourself if you hadn't blocked the day's fetch on one liner. I'm just saying.

+2

barny June 13. 17 at 16:17

source to share

jezrael · Accepted Answer · 2017-06-13T16:33:34+0000

You need DatetimeIndex.dayofyear

:

data['SEASON'] = data.index.dayofyear.map(season)

Another solution with pandas.cut

:

bins = [0, 91, 183, 275, 366]
labels=['Winter', 'Spring', 'Summer', 'Fall']
doy = data.index.dayofyear
data['SEASON1'] = pd.cut(doy + 11 - 366*(doy > 355), bins=bins, labels=labels)

Group data by seasons according to exact dates

More articles: