Replace data from one pandas frame with another
I have two dataframes df1 and df2. They both contain time series data, so it is possible that some of the dates in df1 and df2 intersect with each other, while others do not. My requirement is an operation on two data files that replace the values ββin df1 with the values ββin df2 for the same dates, leave the same values ββfor the indices in df1 not present in df2, and add values ββfor the indices that are present in df2 and not df1. Consider the following example:
df1:
A B C D
0 A0 BO C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
df2:
A B C E
1 A4 B4 C4 E4
2 A5 B5 C5 E5
3 A6 B6 C6 E6
4 A7 B7 C7 E7
result df:
A B C D E
0 A0 BO C0 D0 Nan
1 A4 B4 C4 D4 E4
2 A5 B5 C5 D5 E5
3 A6 B6 C6 D6 E6
4 A7 B7 C7 D7 E7
I tried to work out the logic with the first step, concatenating two dfs, but this results in rows with duplicate indices and not sure how to deal with it. How can this be achieved? Any suggestions will help
Edit: The simpler case would be when the column names are the same in two dataframes. Therefore, consider that df2 has column D instead of E with values ββD4, D5, D6, D7.
Concatenation gives the following result:
concat(df1,df2,axis=1)
A B C D A B C D
0 A0 B0 C0 D0 NaN NaN NaN NaN
1 A1 B1 C1 D1 A4 B4 C4 D4
2 A2 B2 C2 D2 A5 B5 C5 D5
3 A3 B3 C3 D3 A6 B6 C6 D6
4 NaN NaN NaN NaN A7 B7 C7 D7
This now introduces repeating columns. The usual solution would be to loop through each column, but I'm looking for a more elegant solution. Any ideas would be appreciated.
source to share
update
aligned at the indices of both DataFrames:
df1.update(df2)
df1:
A B C D
0 A0 BO C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
df2:
A B C D
1 A4 B4 C4 D4
2 A5 B5 C5 D5
3 A6 B6 C6 D6
4 A7 B7 C7 D7
>>> df1.update(df2)
A B C D
0 A0 BO C0 D0
1 A4 B4 C4 D4
2 A5 B5 C5 D5
3 A6 B6 C6 D6
Then you need to add values ββto df2 that are missing from df1:
>>> df1.append(df2.loc[[i for i in df2.index if i not in df1.index], :])
Out[46]:
A B C D
0 A0 BO C0 D0
1 A4 B4 C4 D4
2 A5 B5 C5 D5
3 A6 B6 C6 D6
4 A7 B7 C7 D7
source to share
I just saw this question and realized that it is almost identical to what I just asked today and that @Alexander (answer poster above) answered very nicely:
pd.concat([df1[~df1.index.isin(df2.index)], df2])
See pandas Console / DataFrame update ("upsert")? for discussion.
source to share