Group data by seasons according to exact dates

I have a csv file containing 4 years of data and I am trying to group the data for a season over 4 years, in other words I need to sum and print all my data for only 4 seasons. here look at my data file:

timestamp,heure,lat,lon,impact,type
2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2008-01-01 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
2008-01-02 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
....
2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1

      

and here's my desired output:

winter     (the mean value of impacts)
summer     (the mean value of impacts)
autumn      ....
spring      .....

      

Actually I tried this code:

names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('flash.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'], dayfirst=True)

spring = range(80, 172)
summer = range(172, 264)
fall = range(264, 355)

def season(x):
    if x in spring:
       return 'Spring'
    if x in summer:
       return 'Summer'
    if x in fall:
       return 'Fall'
   else :
       return 'Winter'

 data['SEASON'] = data.index.to_series().dt.month.map(lambda x : season(x))
 data['impact'] = data['impact'].abs()
 seasonly = data.groupby('SEASON')['impact'].mean()

      

and I got this horrible result:  enter image description here

where am i wrong?

+3


source to share


3 answers


You need DatetimeIndex.dayofyear

:

data['SEASON'] = data.index.dayofyear.map(season)

      



Another solution with pandas.cut

:

bins = [0, 91, 183, 275, 366]
labels=['Winter', 'Spring', 'Summer', 'Fall']
doy = data.index.dayofyear
data['SEASON1'] = pd.cut(doy + 11 - 366*(doy > 355), bins=bins, labels=labels)

      

+2


source


pandas.cut


To properly handle 'Winter'

both the beginning and the end of the year, I moved dayofyear

to 11

and got the results modulo 366

. The reason I am not using the same method as in the solution below numpy

is because it pd.cut

returns a categorical type and I would get 5 categories in which two categories had the same label. I could then cast the result as a string, but that was messy.

data['SEASON'] = pd.cut(
    (data.index.dayofyear + 11) % 366,
    [0, 91, 183, 275, 366],
    labels=['Winter', 'Spring', 'Summer', 'Fall']
)

      


numpy.searchsorted


To handle properly 'Winter'

both at the beginning and end of the year, I allowed two bins for'Winter'

seasons = np.array(['Winter', 'Spring', 'Summer', 'Fall', 'Winter'])
f = np.searchsorted([80, 172, 264, 355], data.index.dayofyear)
data['SEASON'] = seasons[f]

      




plot

data.groupby('SEASON')['impact'].mean().plot.bar()

      

enter image description here

+3


source


Looks like:

data['SEASON'] = data.index.to_series().dt.**month**.map(lambda x : season(x))

      

uses a month, presumably 1-12 or 0-11, which is all "winter". You need to use the day of the year.

But you could probably see this more easily and made it possible to print it out to test it yourself if you hadn't blocked the day's fetch on one liner. I'm just saying.

+2


source







All Articles