Replace data from one pandas frame with another

Question

Replace data from one pandas frame with another

I have two dataframes df1 and df2. They both contain time series data, so it is possible that some of the dates in df1 and df2 intersect with each other, while others do not. My requirement is an operation on two data files that replace the values in df1 with the values in df2 for the same dates, leave the same values for the indices in df1 not present in df2, and add values for the indices that are present in df2 and not df1. Consider the following example:

df1:
    A   B   C   D
0   A0  BO  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2:
    A   B   C   E
1   A4  B4  C4  E4
2   A5  B5  C5  E5
3   A6  B6  C6  E6
4   A7  B7  C7  E7

result df:
    A   B   C   D   E
0   A0  BO  C0  D0  Nan
1   A4  B4  C4  D4  E4
2   A5  B5  C5  D5  E5
3   A6  B6  C6  D6  E6
4   A7  B7  C7  D7  E7

I tried to work out the logic with the first step, concatenating two dfs, but this results in rows with duplicate indices and not sure how to deal with it. How can this be achieved? Any suggestions will help

Edit: The simpler case would be when the column names are the same in two dataframes. Therefore, consider that df2 has column D instead of E with values D4, D5, D6, D7.

Concatenation gives the following result:

concat(df1,df2,axis=1)
    A    B    C    D    A    B    C    D
0   A0   B0   C0   D0  NaN  NaN  NaN  NaN  
1   A1   B1   C1   D1   A4   B4   C4   D4
2   A2   B2   C2   D2   A5   B5   C5   D5
3   A3   B3   C3   D3   A6   B6   C6   D6
4  NaN  NaN  NaN  NaN   A7   B7   C7   D7

This now introduces repeating columns. The usual solution would be to loop through each column, but I'm looking for a more elegant solution. Any ideas would be appreciated.

+3

python pandas

john smith May 24 '15 at 12:50

source to share

2 answers

I just saw this question and realized that it is almost identical to what I just asked today and that @Alexander (answer poster above) answered very nicely:

pd.concat([df1[~df1.index.isin(df2.index)], df2])

See pandas Console / DataFrame update ("upsert")? for discussion.

+1

embeepea 08 oct. '15 at 3:38

source to share

Alexander · Accepted Answer · 2015-05-24T00:57:03+0000

update

aligned at the indices of both DataFrames:

df1.update(df2)

df1:
    A   B   C   D
0   A0  BO  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2:
    A   B   C   D
1   A4  B4  C4  D4
2   A5  B5  C5  D5
3   A6  B6  C6  D6
4   A7  B7  C7  D7

>>> df1.update(df2)
    A   B   C   D
0  A0  BO  C0  D0
1  A4  B4  C4  D4
2  A5  B5  C5  D5
3  A6  B6  C6  D6

Then you need to add values to df2 that are missing from df1:

>>> df1.append(df2.loc[[i for i in df2.index if i not in df1.index], :])
Out[46]: 
    A   B   C   D
0  A0  BO  C0  D0
1  A4  B4  C4  D4
2  A5  B5  C5  D5
3  A6  B6  C6  D6
4  A7  B7  C7  D7

Replace data from one pandas frame with another

More articles: