Replace data from one pandas frame with another

I have two dataframes df1 and df2. They both contain time series data, so it is possible that some of the dates in df1 and df2 intersect with each other, while others do not. My requirement is an operation on two data files that replace the values ​​in df1 with the values ​​in df2 for the same dates, leave the same values ​​for the indices in df1 not present in df2, and add values ​​for the indices that are present in df2 and not df1. Consider the following example:

df1:
    A   B   C   D
0   A0  BO  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2:
    A   B   C   E
1   A4  B4  C4  E4
2   A5  B5  C5  E5
3   A6  B6  C6  E6
4   A7  B7  C7  E7

result df:
    A   B   C   D   E
0   A0  BO  C0  D0  Nan
1   A4  B4  C4  D4  E4
2   A5  B5  C5  D5  E5
3   A6  B6  C6  D6  E6
4   A7  B7  C7  D7  E7

      

I tried to work out the logic with the first step, concatenating two dfs, but this results in rows with duplicate indices and not sure how to deal with it. How can this be achieved? Any suggestions will help

Edit: The simpler case would be when the column names are the same in two dataframes. Therefore, consider that df2 has column D instead of E with values ​​D4, D5, D6, D7.

Concatenation gives the following result:

concat(df1,df2,axis=1)
    A    B    C    D    A    B    C    D
0   A0   B0   C0   D0  NaN  NaN  NaN  NaN  
1   A1   B1   C1   D1   A4   B4   C4   D4
2   A2   B2   C2   D2   A5   B5   C5   D5
3   A3   B3   C3   D3   A6   B6   C6   D6
4  NaN  NaN  NaN  NaN   A7   B7   C7   D7

      

This now introduces repeating columns. The usual solution would be to loop through each column, but I'm looking for a more elegant solution. Any ideas would be appreciated.

+3


source to share


2 answers


update

aligned at the indices of both DataFrames:

df1.update(df2)

df1:
    A   B   C   D
0   A0  BO  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2:
    A   B   C   D
1   A4  B4  C4  D4
2   A5  B5  C5  D5
3   A6  B6  C6  D6
4   A7  B7  C7  D7

>>> df1.update(df2)
    A   B   C   D
0  A0  BO  C0  D0
1  A4  B4  C4  D4
2  A5  B5  C5  D5
3  A6  B6  C6  D6

      



Then you need to add values ​​to df2 that are missing from df1:

>>> df1.append(df2.loc[[i for i in df2.index if i not in df1.index], :])
Out[46]: 
    A   B   C   D
0  A0  BO  C0  D0
1  A4  B4  C4  D4
2  A5  B5  C5  D5
3  A6  B6  C6  D6
4  A7  B7  C7  D7

      

+3


source


I just saw this question and realized that it is almost identical to what I just asked today and that @Alexander (answer poster above) answered very nicely:

pd.concat([df1[~df1.index.isin(df2.index)], df2])

      



See pandas Console / DataFrame update ("upsert")? for discussion.

+1


source







All Articles