Pandas Concatenate two DataFrames without some columns
Context
I am trying to merge two large CSV files together.
Problem
Let's say I have one Pandas DataFrame as shown below ...
EntityNum foo ...
------------------------
1001.01 100
1002.02 50
1003.03 200
And one more such ...
EntityNum a_col b_col
-----------------------------------
1001.01 alice 7
1002.02 bob 8
1003.03 777 9
I would like to join them like this:
EntityNum foo a_col
----------------------------
1001.01 100 alice
1002.02 50 bob
1003.03 200 777
So be aware, I don't want b_col in the final result. How can I do this using Pandas?
Using SQL, I probably should have done something like:
SELECT t1.*, t2.a_col FROM table_1 as t1
LEFT JOIN table_2 as t2
ON t1.EntityNum = t2.EntityNum;
Search
I know that merge can be used. This is what I tried:
import pandas as pd
df_a = pd.read_csv(path_a, sep=',')
df_b = pd.read_csv(path_b, sep=',')
df_c = pd.merge(df_a, df_b, on='EntityNumber')
But I got stuck when it came to avoiding some unwanted columns in the final frame.
source to share
You can first access the relevant data columns of the data through your labels (for example, df_a[['EntityNum', 'foo']]
and then join them.
df_a[['EntityNum', 'foo']].merge(df_b[['EntityNum', 'a_col']], on='EntityNum', how='left')
Note that the default behavior for merge
is inner join.
source to share
Notice how in SQL you will do the join first and then select the columns you want. In a similar vein, you can do a full join in Pandas and then select the columns you want.
Alternatively do a full join and del
columns you don't want.
Finally, you can first select the columns you ant and then do the join.
source to share