Group data by seasons according to exact dates
I have a csv file containing 4 years of data and I am trying to group the data for a season over 4 years, in other words I need to sum and print all my data for only 4 seasons. here look at my data file:
timestamp,heure,lat,lon,impact,type
2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2008-01-01 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
2008-01-02 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
....
2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
and here's my desired output:
winter (the mean value of impacts)
summer (the mean value of impacts)
autumn ....
spring .....
Actually I tried this code:
names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('flash.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'], dayfirst=True)
spring = range(80, 172)
summer = range(172, 264)
fall = range(264, 355)
def season(x):
if x in spring:
return 'Spring'
if x in summer:
return 'Summer'
if x in fall:
return 'Fall'
else :
return 'Winter'
data['SEASON'] = data.index.to_series().dt.month.map(lambda x : season(x))
data['impact'] = data['impact'].abs()
seasonly = data.groupby('SEASON')['impact'].mean()
and I got this horrible result:
where am i wrong?
source to share
You need DatetimeIndex.dayofyear
:
data['SEASON'] = data.index.dayofyear.map(season)
Another solution with pandas.cut
:
bins = [0, 91, 183, 275, 366] labels=['Winter', 'Spring', 'Summer', 'Fall'] doy = data.index.dayofyear data['SEASON1'] = pd.cut(doy + 11 - 366*(doy > 355), bins=bins, labels=labels)
source to share
pandas.cut
To properly handle 'Winter'
both the beginning and the end of the year, I moved dayofyear
to 11
and got the results modulo 366
. The reason I am not using the same method as in the solution below numpy
is because it pd.cut
returns a categorical type and I would get 5 categories in which two categories had the same label. I could then cast the result as a string, but that was messy.
data['SEASON'] = pd.cut(
(data.index.dayofyear + 11) % 366,
[0, 91, 183, 275, 366],
labels=['Winter', 'Spring', 'Summer', 'Fall']
)
numpy.searchsorted
To handle properly 'Winter'
both at the beginning and end of the year, I allowed two bins for'Winter'
seasons = np.array(['Winter', 'Spring', 'Summer', 'Fall', 'Winter'])
f = np.searchsorted([80, 172, 264, 355], data.index.dayofyear)
data['SEASON'] = seasons[f]
plot
data.groupby('SEASON')['impact'].mean().plot.bar()
source to share
Looks like:
data['SEASON'] = data.index.to_series().dt.**month**.map(lambda x : season(x))
uses a month, presumably 1-12 or 0-11, which is all "winter". You need to use the day of the year.
But you could probably see this more easily and made it possible to print it out to test it yourself if you hadn't blocked the day's fetch on one liner. I'm just saying.
source to share