Creating missing time slots and filling 0 values in pandas series

Question

Creating missing time slots and filling 0 values in pandas series

I have a pandas series as such (which I did by doing a group on two columns divvy_df.groupby(['from_station_name', 'starttime']).size()

):

from_station_name       starttime          
900 W Harrison          2014-07-01 08:00:00    1
                        2014-07-01 10:00:00    1
                        2014-07-01 11:00:00    1
                        2014-07-01 12:00:00    1
                        2014-07-01 13:00:00    1
                        2014-07-01 16:00:00    1
                        2014-07-01 17:00:00    3
                        2014-07-01 22:00:00    1
                        2014-07-02 01:00:00    1
                        2014-07-02 08:00:00    1
                        2014-07-02 12:00:00    2
                        ...

As you can see, the resulting series shows the number of times the start time is recorded for this station name. I would like to do this somehow so that the missing timeslots are created and filled with a value of 0. Is this possible?

So, I would like something like this:

from_station_name       starttime          
    900 W Harrison          2014-07-01 00:00:00    0
                            2014-07-01 01:00:00    0
                            2014-07-01 02:00:00    0
                            2014-07-01 03:00:00    0
                            ...
                            2014-07-01 08:00:00    1
                            2014-07-01 09:00:00    0   
                            2014-07-01 10:00:00    1
                            2014-07-01 11:00:00    1
                            2014-07-01 12:00:00    1
                            2014-07-01 13:00:00    0
                            ...
                            2014-07-01 22:00:00    1
                            2014-07-01 23:00:00    0
                            2014-07-02 00:00:00    0

What's the best way to get something like this?

+3

python pandas group-by

SpicyClubSauce 23 jul. At 1:45 am

source to share

1 answer

unutbu · Accepted Answer · 2015-07-23T02:25:30+0000

Usually, when you have a series, ts

with DatetimeIndex, you can expand it to include new dates using ts.reindex

, or ts.asfreq

or ts.resample

. Since it ts.reindex

has a parameter fill_value

to fill in missing values, I think which is the easiest to use in this case ts.reindex

.

However, this issue ts

has MultiIndex with from_station_name

and starttime

.

So the first step is to move the level from_station_name

out of the index. There are two options. You can use reset_index

to move it to a column or use unstack

to move it to a new index level of the column. I think unstacking

with

ts = ts.unstack(level=0)

is the best choice here because with the other option (using reset_index

) you run into problems with duplicate dates in the index and there is no problem of padding and padding with padding and forward-filling from_station_name

.

The series now has a sibling DatetimeIndex, so you can reindex

like this:

index = pd.date_range('2014-07-01', '2014-07-03', freq='H')
ts = ts.reindex(index, fill_value=0)

Now, undo the operation unstack

, so set the column index level back to the row index:

ts = ts.stack()
ts = ts.swaplevel(0,1)

import pandas as pd
ts = pd.Series({('900 W Harrison', '2014-07-01 08:00:00'): 1,
 ('900 W Harrison', '2014-07-01 10:00:00'): 1,
 ('900 W Harrison', '2014-07-01 11:00:00'): 1,
 ('900 W Harrison', '2014-07-01 12:00:00'): 1,
 ('900 W Harrison', '2014-07-01 13:00:00'): 1,
 ('900 W Harrison', '2014-07-01 16:00:00'): 1,
 ('900 W Harrison', '2014-07-01 17:00:00'): 3,
 ('900 W Harrison', '2014-07-01 22:00:00'): 1,
 ('900 W Harrison', '2014-07-02 01:00:00'): 1,
 ('900 W Harrison', '2014-07-02 08:00:00'): 1,
 ('900 W Harrison', '2014-07-02 12:00:00'): 2})

ts = ts.unstack(level=0)
# ensure ts.index is a DatetimeIndex
ts.index = pd.to_datetime(ts.index)
index = pd.date_range('2014-07-01', '2014-07-03', freq='H')
ts = ts.reindex(index, fill_value=0)
ts = ts.stack()
ts = ts.swaplevel(0,1)
ts = ts.sortlevel()
print(ts)

gives

900 W Harrison  2014-07-01 00:00:00    0
                2014-07-01 01:00:00    0
                2014-07-01 02:00:00    0
                2014-07-01 03:00:00    0
                2014-07-01 04:00:00    0
                2014-07-01 05:00:00    0
                2014-07-01 06:00:00    0
                2014-07-01 07:00:00    0
                2014-07-01 08:00:00    1
                2014-07-01 09:00:00    0
                2014-07-01 10:00:00    1
                ...
                2014-07-02 21:00:00    0
                2014-07-02 22:00:00    0
                2014-07-02 23:00:00    0
                2014-07-03 00:00:00    0
dtype: int64

Creating missing time slots and filling 0 values ​​in pandas series

More articles:

Creating missing time slots and filling 0 values in pandas series