Pandas - merge two DataFrames with identical column names

Question

Pandas - merge two DataFrames with identical column names

I have two dataframes with the same column names and identical ids in the first column. Except for the ID column, every cell that contains a value in one DataFrame contains a NaN in the other. Here's an example of what they look like:

ID    Cat1    Cat2    Cat3
1     NaN     75      NaN
2     61      NaN     84
3     NaN     NaN     NaN


ID    Cat1    Cat2    Cat3
1     54      NaN     44
2     NaN     38     NaN
3     49      50      53

I want to combine them into one DataFrame while keeping the same column names. Thus, the result will look like this:

ID    Cat1    Cat2    Cat3
1     54      75      44
2     61      38      84
3     49      50      53

I tried:

df3 = pd.merge(df1, df2, on='ID', how='outer')

Which gave me a DataFrame containing twice as many columns. How can I concatenate the values from each DataFrame into one?

+3

python merge pandas dataframe

Slavatron 05 Aug 14 at 17:56

source to share

2 answers

In this case, comb_first is fine . ( http://pandas.pydata.org/pandas-docs/version/0.13.1/merging.html )

As the name suggests, comb_first takes the first DataFrame and adds the values from the second to it, where it finds the NaN value in the first.

So:

df3 = df1.combine_first(df2)

creates a new DataFrame, df3, which is essentially just df1 with the values from df2 populated whenever possible.

+2

Slavatron 05 Aug '14 at 18:00

source to share

Roger fan · Accepted Answer · 2014-08-05T18:00:35+0000

You probably need df.update . See Documentation.

df1.update(df2, raise_conflict=True)

Pandas - merge two DataFrames with identical column names

More articles: