Pandas left merge data separator without reindexing
I wonder if there is a more intuitive way to merge data frames
In[140]: df1 = pd.DataFrame(data=[[1,2],[3,4],[10,4],[5,6]], columns=['A','B'], index=[1,3,5,7])
In[141]: df1
Out[141]:
A B
1 1 2
3 3 4
5 10 4
7 5 6
In[142]: df2 = pd.DataFrame(data=[[1,5],[3,4],[10,3],[5,2]], columns=['A','C'], index=[0,2,4,6])
In[143]: df2
Out[143]:
A C
0 1 5
2 3 4
4 10 3
6 5 2
My desired concatenation should look like this:
A B C
1 1 2 5
3 3 4 4
5 10 4 3
7 5 6 2
The key is to keep the original left index of the data.
Left merge doesn't work because it reindexes
In[150]: pd.merge(df1, df2, how='left', on='A')
Out[150]:
A B C
0 1 2 5
1 3 4 4
2 10 4 3
3 5 6 2
After some trial and error, it turned out that it works, but wonder if there is a more intuitive way to achieve the same.
In[151]: pd.merge(df1, df2, how='outer', on=['A'], right_index=True)
Out[151]:
A B C
1 1 2 5
3 3 4 4
5 10 4 3
7 5 6 2
+3
source to share
1 answer
pd.merge(df1, df2, how='outer', on=['A'], right_index=True)
looks a little strange to me. He says to join two tables in column A and also the index of the right table where there is nothing in the left table. I wonder why this works.
I would do something like this:
In [27]: df1['index'] = df1.index
In [28]: df2['index'] = df2.index
In [33]: df_merge = pd.merge(df1, df2, how='left', on=['A'])
In [34]: df_merge
Out[34]:
A B index_x C index_y
0 1 2 1 5 1
1 3 4 3 4 2
2 10 4 5 3 3
3 5 6 7 2 4
In [35]: df_merge = df_merge[['A', 'B', 'C', 'index_x']]
In [36]: df_merge
Out[36]:
A B C index_x
0 1 2 5 1
1 3 4 4 3
2 10 4 3 5
3 5 6 2 7
[4 rows x 4 columns]
In [37]: df_merge.set_index(['index_x'])
Out[37]:
A B C
index_x
1 1 2 5
3 3 4 4
5 10 4 3
7 5 6 2
+2
source to share