Convert PANDAS data from monthly to daily
I have a dataframe with 2014 monthly data for a series of 317 stock quotes (317 tickers x 12 months = 3.804 lines in DF). I would like to convert it to a daily dataframe (317 tickers x 365 days = 115,705 rows). So I believe I need to boost or re-index by spreading the monthly values โโfor each day of the month, but I can't seem to get it to work correctly.
The dataframe is currently in this format:
>>> df
month ticker b c
2014-1 AAU 10 .04 #different values every month for each ticker
2014-2 AAU 20 .03
2014-3 AAU 13 .06
.
2014-12 AAU 11 .03
.
.
.
2014-1 ZZY 11 .11
2014-2 ZZY 6 .03
.
2014-12 ZZY 17 .09
And here's what I would like:
>>> df
day ticker b c
2014-01-01 AAU 10 .04 #same values every day in month for each ticker
2014-01-02 AAU 10 .04
2014-01-03 AAU 10 .04
.
2014-01-31 AAU 10 .04
2014-02-01 AAU 20 .03
2014-02-02 AAU 20 .03
.
2014-02-28 AAU 20 .03
.
.
.
2014-12-30 ZZY 17 .09
2014-12-31 ZZY 17 .09
I tried to do the group in combination with resampling by day, but the updated dataframe will start from the date "2014-01-13" and not from 1 January and ends from "2014-12-01" and not 31 December. I also tried change the month values โโfrom for example "2014-1" to "2014-01-01" and so on, but the changed data format still ends with "2014-01-01". There should be an easier way to do this, so I would appreciate any help. I've been spinning all day in circles.
source to share
First, parse the dates of the month into Pandas timestamps:
df['month'] = pd.to_datetime(df['month'], format='%Y-%m')
# month ticker b c
# 0 2014-01-01 AAU 10 0.04
# 1 2014-02-01 AAU 20 0.03
# 2 2014-03-01 AAU 13 0.06
# 3 2014-12-01 AAU 11 0.03
# 4 2014-01-01 ZZY 11 0.11
# 5 2014-02-01 ZZY 6 0.03
# 6 2014-12-01 ZZY 17 0.09
Then expand the DataFrame using month as index and ticker as column level:
df = df.pivot(index='month', columns='ticker')
# b c
# ticker AAU ZZY AAU ZZY
# month
# 2014-01-01 10 11 0.04 0.11
# 2014-02-01 20 6 0.03 0.03
# 2014-03-01 13 NaN 0.06 NaN
# 2014-12-01 11 17 0.03 0.09
By rotating, we can now orient each column more easily later.
Now find the start and end dates:
start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)
It is interesting to note that adding pd.DateOffset(day=31)
will not always result in a date that ends on the 31st day. If the month is February, the addition pd.DateOffset(day=31)
returns the last day in February:
In [130]: pd.Timestamp('2014-2-28') + pd.DateOffset(day=31)
Out[130]: Timestamp('2014-02-28 00:00:00')
This is good, as it means adding pd.DateOffset(day=31)
will always give us the last valid day of the month.
Now we can reindex and forward the filling of the DataFrame:
dates = pd.date_range(start_date, end_date, freq='D')
dates.name = 'date'
df = df.reindex(dates, method='ffill')
what gives
In [160]: df.head()
Out[160]:
b c
ticker AAU ZZY AAU ZZY
date
2014-01-01 10 11 0.04 0.11
2014-01-02 10 11 0.04 0.11
2014-01-03 10 11 0.04 0.11
2014-01-04 10 11 0.04 0.11
2014-01-05 10 11 0.04 0.11
In [161]: df.tail()
Out[161]:
b c
ticker AAU ZZY AAU ZZY
date
2014-12-27 11 17 0.03 0.09
2014-12-28 11 17 0.03 0.09
2014-12-29 11 17 0.03 0.09
2014-12-30 11 17 0.03 0.09
2014-12-31 11 17 0.03 0.09
To move the ticker from the column index and back to the column:
df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()
So all together:
import pandas as pd df = pd.read_table('data', sep='\s+') df['month'] = pd.to_datetime(df['month'], format='%Y-%m') df = df.pivot(index='month', columns='ticker') start_date = df.index.min() - pd.DateOffset(day=1) end_date = df.index.max() + pd.DateOffset(day=31) dates = pd.date_range(start_date, end_date, freq='D') dates.name = 'date' df = df.reindex(dates, method='ffill') df = df.stack('ticker') df = df.sortlevel(level=1) df = df.reset_index()
gives
In [163]: df.head()
Out[163]:
date ticker b c
0 2014-01-01 AAU 10 0.04
1 2014-01-02 AAU 10 0.04
2 2014-01-03 AAU 10 0.04
3 2014-01-04 AAU 10 0.04
4 2014-01-05 AAU 10 0.04
In [164]: df.tail()
Out[164]:
date ticker b c
450 2014-12-27 ZZY 17 0.09
451 2014-12-28 ZZY 17 0.09
452 2014-12-29 ZZY 17 0.09
453 2014-12-30 ZZY 17 0.09
454 2014-12-31 ZZY 17 0.09
source to share
Let's make a synthetic experiment. Let's say we have daily time series data:
dates = pd.date_range(start, end, freq='D')
ts = pd.Series(data, index=dates)
Create a monthly time series by averaging all data over a month:
ts_mon = ts.resample('MS', how='mean')
Now try increasing this monthly time series to a daily time series with uniform values โโthroughout the month. The first method, which borrows a step from @unutbu using reindex, works well:
ts_daily = ts_mon.reindex(dates, method='ffill')
Out:
2000-01-01 100.21
2000-01-02 100.21
...
2000-12-30 80.75
2000-12-31 80.75
The second method using resample does not work as it returns the first day of the last month:
ts_daily = ts_mon.resample('D').ffill()
Out:
2000-01-01 100.21
2000-01-02 100.21
...
2000-11-30 99.33
2000-12-01 80.75
source to share