Problems with grouping pandas data by hour

First, my dataset is shown below

here

What I would like to do is group my columns by an pickup_datetime

hour. I found related questions here , but for some reason the solution doesn't seem to work. I've included my actions below.

I first started with this:

df["dropoff_datetime"] = pd.to_datetime(df["dropoff_datetime"])
df["pickup_datetime"] = pd.to_datetime(df["pickup_datetime"])

test = df.groupby(df.hour).sum()

      

And I got the following error:

AttributeError: 'DataFrame' object has no attribute 'hour'

      

Then I tried this:

test = df.groupby(df.dropoff_datetime.hour).sum()

      

And I got the following error:

AttributeError: 'Series' object has no attribute 'hour'

      

I am a bit confused because it seems like my situation is the same as the question linked above. I'm not sure why I am getting errors. Any help would be much appreciated

+3


source to share


2 answers


we can use Series.dt.hour accessor:

test = df.groupby(df['pickup_datetime'].dt.hour).sum()

      

Here's an example describing the difference:

In [136]: times = pd.to_datetime(['2017-08-01 13:13:13', '2017-08-01 20:20:20'])

In [137]: times
Out[137]: DatetimeIndex(['2017-08-01 13:13:13', '2017-08-01 20:20:20'], dtype='datetime64[ns]', freq=None)

In [138]: type(times)
Out[138]: pandas.core.indexes.datetimes.DatetimeIndex

In [139]: times.hour
Out[139]: Int64Index([13, 20], dtype='int64')

      

as shown above DatetimeIndex

has a "direct" .hour

accessor, but Series

of datetime

dtype has an .dt.hour

accessor:

In [140]: df = pd.DataFrame({'Date': times})

In [141]: df
Out[141]:
                 Date
0 2017-08-01 13:13:13
1 2017-08-01 20:20:20

In [142]: type(df.Date)
Out[142]: pandas.core.series.Series

In [143]: df['Date'].dt.hour
Out[143]:
0    13
1    20
Name: Date, dtype: int64

      



If we set the column Date

as index:

In [146]: df.index = df['Date']

In [147]: df
Out[147]:
                                   Date
Date
2017-08-01 13:13:13 2017-08-01 13:13:13
2017-08-01 20:20:20 2017-08-01 20:20:20

      

he becomes:

In [149]: type(df.index)
Out[149]: pandas.core.indexes.datetimes.DatetimeIndex

      

so that we can access it directly (without .dt

accessor):

In [148]: df.index.hour
Out[148]: Int64Index([13, 20], dtype='int64', name='Date')

      

+4


source


Needed .dt

because working with Series

- Series.dt.hour

:

test = df.groupby(df.dropoff_datetime.dt.hour).sum()

      




But if DatetimeIndex

, omit it - DatetimeIndex.hour

:

test = df.groupby(df.index.hour).sum()

      

+1


source







All Articles