Convert PANDAS data from monthly to daily

I have a dataframe with 2014 monthly data for a series of 317 stock quotes (317 tickers x 12 months = 3.804 lines in DF). I would like to convert it to a daily dataframe (317 tickers x 365 days = 115,705 rows). So I believe I need to boost or re-index by spreading the monthly values โ€‹โ€‹for each day of the month, but I can't seem to get it to work correctly.

The dataframe is currently in this format:

>>> df
month    ticker   b    c
2014-1   AAU      10   .04     #different values every month for each ticker
2014-2   AAU      20   .03
2014-3   AAU      13   .06
.
2014-12  AAU      11   .03
.
.
.
2014-1   ZZY      11   .11
2014-2   ZZY      6    .03
.
2014-12  ZZY      17   .09

      

And here's what I would like:

>>> df
day          ticker   b    c
2014-01-01   AAU      10   .04  #same values every day in month for each ticker
2014-01-02   AAU      10   .04
2014-01-03   AAU      10   .04
.
2014-01-31   AAU      10   .04
2014-02-01   AAU      20   .03
2014-02-02   AAU      20   .03
.
2014-02-28   AAU      20   .03
.
.
.
2014-12-30   ZZY      17   .09 
2014-12-31   ZZY      17   .09 

      

I tried to do the group in combination with resampling by day, but the updated dataframe will start from the date "2014-01-13" and not from 1 January and ends from "2014-12-01" and not 31 December. I also tried change the month values โ€‹โ€‹from for example "2014-1" to "2014-01-01" and so on, but the changed data format still ends with "2014-01-01". There should be an easier way to do this, so I would appreciate any help. I've been spinning all day in circles.

+2


source to share


2 answers


First, parse the dates of the month into Pandas timestamps:

df['month'] = pd.to_datetime(df['month'], format='%Y-%m')
#        month ticker   b     c
# 0 2014-01-01    AAU  10  0.04
# 1 2014-02-01    AAU  20  0.03
# 2 2014-03-01    AAU  13  0.06
# 3 2014-12-01    AAU  11  0.03
# 4 2014-01-01    ZZY  11  0.11
# 5 2014-02-01    ZZY   6  0.03
# 6 2014-12-01    ZZY  17  0.09

      

Then expand the DataFrame using month as index and ticker as column level:

df = df.pivot(index='month', columns='ticker')
#              b         c      
# ticker     AAU ZZY   AAU   ZZY
# month                         
# 2014-01-01  10  11  0.04  0.11
# 2014-02-01  20   6  0.03  0.03
# 2014-03-01  13 NaN  0.06   NaN
# 2014-12-01  11  17  0.03  0.09

      

By rotating, we can now orient each column more easily later.

Now find the start and end dates:

start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)

      

It is interesting to note that adding pd.DateOffset(day=31)

will not always result in a date that ends on the 31st day. If the month is February, the addition pd.DateOffset(day=31)

returns the last day in February:

In [130]: pd.Timestamp('2014-2-28') + pd.DateOffset(day=31)
Out[130]: Timestamp('2014-02-28 00:00:00')

      

This is good, as it means adding pd.DateOffset(day=31)

will always give us the last valid day of the month.

Now we can reindex and forward the filling of the DataFrame:



dates = pd.date_range(start_date, end_date, freq='D')
dates.name = 'date'
df = df.reindex(dates, method='ffill')

      

what gives

In [160]: df.head()
Out[160]: 
             b         c      
ticker     AAU ZZY   AAU   ZZY
date                          
2014-01-01  10  11  0.04  0.11
2014-01-02  10  11  0.04  0.11
2014-01-03  10  11  0.04  0.11
2014-01-04  10  11  0.04  0.11
2014-01-05  10  11  0.04  0.11

In [161]: df.tail()
Out[161]: 
             b         c      
ticker     AAU ZZY   AAU   ZZY
date                          
2014-12-27  11  17  0.03  0.09
2014-12-28  11  17  0.03  0.09
2014-12-29  11  17  0.03  0.09
2014-12-30  11  17  0.03  0.09
2014-12-31  11  17  0.03  0.09

      

To move the ticker from the column index and back to the column:

df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()

      


So all together:

import pandas as pd
df = pd.read_table('data', sep='\s+')
df['month'] = pd.to_datetime(df['month'], format='%Y-%m')
df = df.pivot(index='month', columns='ticker')

start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)
dates = pd.date_range(start_date, end_date, freq='D')
dates.name = 'date'
df = df.reindex(dates, method='ffill')

df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()

      

gives

In [163]: df.head()
Out[163]: 
        date ticker   b     c
0 2014-01-01    AAU  10  0.04
1 2014-01-02    AAU  10  0.04
2 2014-01-03    AAU  10  0.04
3 2014-01-04    AAU  10  0.04
4 2014-01-05    AAU  10  0.04

In [164]: df.tail()
Out[164]: 
          date ticker   b     c
450 2014-12-27    ZZY  17  0.09
451 2014-12-28    ZZY  17  0.09
452 2014-12-29    ZZY  17  0.09
453 2014-12-30    ZZY  17  0.09
454 2014-12-31    ZZY  17  0.09

      

+8


source


Let's make a synthetic experiment. Let's say we have daily time series data:

dates = pd.date_range(start, end, freq='D')
ts = pd.Series(data, index=dates)

      

Create a monthly time series by averaging all data over a month:

ts_mon = ts.resample('MS', how='mean')

      



Now try increasing this monthly time series to a daily time series with uniform values โ€‹โ€‹throughout the month. The first method, which borrows a step from @unutbu using reindex, works well:

ts_daily = ts_mon.reindex(dates, method='ffill')
Out:
  2000-01-01 100.21
  2000-01-02 100.21
  ...
  2000-12-30 80.75
  2000-12-31 80.75

      

The second method using resample does not work as it returns the first day of the last month:

ts_daily = ts_mon.resample('D').ffill()
Out:
  2000-01-01 100.21
  2000-01-02 100.21
  ...
  2000-11-30 99.33
  2000-12-01 80.75

      

+1


source







All Articles