Append data to frames using date and time index portions
I have two dataframes with a datetime index.
import pandas as pd
d = {'dat': ['2016-01-01', '2016-01-02', '2016-01-03', '2017-01-01', '2017-01-02', '2017-01-03'],'x': [1, 2, 3, 4, 5, 6]}
df1 = pd.DataFrame(d)
df1.set_index(['dat'], inplace=True)
df1.index = pd.to_datetime(df1.index)
d = {'dat': ['2016-01-01', '2017-01-01'],'y': [10, 11]}
df2 = pd.DataFrame(d)
df2.set_index(['dat'], inplace=True)
df2.index = pd.to_datetime(df2.index)
df1:
x
dat
2016-01-01 1
2016-01-02 2
2016-01-03 3
2017-01-01 4
2017-01-02 5
2017-01-03 6
df2:
y
dat
2016-01-01 10
2017-01-01 11
I would like to join them using only the year and month of the index portion. So the result will look like this:
DF3:
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
I tried to join them using
df1.join(df2, how='inner')
and I know I can highlight the year and month parts like this:
df1.index.map(lambda x: x.strftime('%Y-%m'))
df2.index.map(lambda x: x.strftime('%Y-%m'))
But I am wondering how can I combine all of this to achieve the desired result?
Many thanks
source to share
The information you want to combine is not explicitly defined anywhere. And it's no good to keep our dates in the index when we merge without destroying it. Thus, we move the indices into the actual dataframe and create two new columns to join. Namely, year
and month
. I've wrapped this part in a function to better understand what's going on where.
def f(df):
df = df.reset_index()
return df.assign(year=df.dat.dt.year, month=df.dat.dt.month)
df = f(df1).merge(f(df2), on=['year', 'month'], suffixes=['', '_'])
df.set_index('dat')[['x', 'y']]
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
This is a different concept using pd.Index.map
and to_period
. Create a dictionary mapping from df2
that translates the year / month period object to the corresponding value in the column y
. Then use map
to match the periods in df1.index
with the correct values y
.
m = dict(zip(df2.index.to_period('M'), df2.y))
df1.assign(y=df1.index.to_period('M').map(m.get))
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
Customization
dates1 = ['2016-01-01', '2016-01-02', '2016-01-03',
'2017-01-01', '2017-01-02', '2017-01-03']
df1 = pd.DataFrame({'x': range(1, 7)}, pd.DatetimeIndex(dates1, name='dat'))
dates2 = ['2016-01-01', '2017-01-01']
df2 = pd.DataFrame({'y': [10, 11]}, pd.DatetimeIndex(dates2, name='dat'))
source to share
You can use merge
from assign
to year
and month
from DateTimeIndex:
df3 = (df1.assign(year=df1.index.year, month=df1.index.month)
.merge(df2.assign(year=df2.index.year, month=df2.index.month), on =['year','month'],right_index=True)
.drop(['year','month'],axis=1))
Output:
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
source to share