Pandas - merge two DataFrames with identical column names
I have two dataframes with the same column names and identical ids in the first column. Except for the ID column, every cell that contains a value in one DataFrame contains a NaN in the other. Here's an example of what they look like:
ID Cat1 Cat2 Cat3
1 NaN 75 NaN
2 61 NaN 84
3 NaN NaN NaN
ID Cat1 Cat2 Cat3
1 54 NaN 44
2 NaN 38 NaN
3 49 50 53
I want to combine them into one DataFrame while keeping the same column names. Thus, the result will look like this:
ID Cat1 Cat2 Cat3
1 54 75 44
2 61 38 84
3 49 50 53
I tried:
df3 = pd.merge(df1, df2, on='ID', how='outer')
Which gave me a DataFrame containing twice as many columns. How can I concatenate the values from each DataFrame into one?
source to share
In this case, comb_first is fine . ( http://pandas.pydata.org/pandas-docs/version/0.13.1/merging.html )
As the name suggests, comb_first takes the first DataFrame and adds the values from the second to it, where it finds the NaN value in the first.
So:
df3 = df1.combine_first(df2)
creates a new DataFrame, df3, which is essentially just df1 with the values from df2 populated whenever possible.
source to share