Append data to frames using date and time index portions

I have two dataframes with a datetime index.

import pandas as pd

d = {'dat': ['2016-01-01', '2016-01-02', '2016-01-03', '2017-01-01', '2017-01-02', '2017-01-03'],'x': [1, 2, 3, 4, 5, 6]}
df1 = pd.DataFrame(d)
df1.set_index(['dat'], inplace=True)
df1.index = pd.to_datetime(df1.index)

d = {'dat': ['2016-01-01', '2017-01-01'],'y': [10, 11]}
df2 = pd.DataFrame(d)
df2.set_index(['dat'], inplace=True)
df2.index = pd.to_datetime(df2.index)

      

df1:

            x
dat          
2016-01-01  1
2016-01-02  2
2016-01-03  3
2017-01-01  4
2017-01-02  5
2017-01-03  6

      

df2:

             y
dat           
2016-01-01  10
2017-01-01  11

      

I would like to join them using only the year and month of the index portion. So the result will look like this:

DF3:

            x  y
dat          
2016-01-01  1  10 
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

      

I tried to join them using

df1.join(df2, how='inner')

      

and I know I can highlight the year and month parts like this:

df1.index.map(lambda x: x.strftime('%Y-%m'))
df2.index.map(lambda x: x.strftime('%Y-%m'))

      

But I am wondering how can I combine all of this to achieve the desired result?

Many thanks

+3


source to share


2 answers


The information you want to combine is not explicitly defined anywhere. And it's no good to keep our dates in the index when we merge without destroying it. Thus, we move the indices into the actual dataframe and create two new columns to join. Namely, year

and month

. I've wrapped this part in a function to better understand what's going on where.

def f(df):
    df = df.reset_index()
    return df.assign(year=df.dat.dt.year, month=df.dat.dt.month)

df = f(df1).merge(f(df2), on=['year', 'month'], suffixes=['', '_'])

df.set_index('dat')[['x', 'y']]

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

      


This is a different concept using pd.Index.map

and to_period

. Create a dictionary mapping from df2

that translates the year / month period object to the corresponding value in the column y

. Then use map

to match the periods in df1.index

with the correct values y

.



m = dict(zip(df2.index.to_period('M'), df2.y))
df1.assign(y=df1.index.to_period('M').map(m.get))

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

      


Customization

dates1 = ['2016-01-01', '2016-01-02', '2016-01-03',
          '2017-01-01', '2017-01-02', '2017-01-03']
df1 = pd.DataFrame({'x': range(1, 7)}, pd.DatetimeIndex(dates1, name='dat'))

dates2 = ['2016-01-01', '2017-01-01']
df2 = pd.DataFrame({'y': [10, 11]}, pd.DatetimeIndex(dates2, name='dat'))

      

+4


source


You can use merge

from assign

to year

and month

from DateTimeIndex:

df3 = (df1.assign(year=df1.index.year, month=df1.index.month)
      .merge(df2.assign(year=df2.index.year, month=df2.index.month), on =['year','month'],right_index=True)
      .drop(['year','month'],axis=1))

      



Output:

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

      

+3


source







All Articles