Pandas Concatenate two DataFrames without some columns

Context

I am trying to merge two large CSV files together.

Problem

Let's say I have one Pandas DataFrame as shown below ...

EntityNum    foo   ...
------------------------
1001.01      100
1002.02       50
1003.03      200

      

And one more such ...

EntityNum    a_col    b_col
-----------------------------------
1001.01      alice        7  
1002.02        bob        8
1003.03        777        9

      

I would like to join them like this:

EntityNum    foo    a_col
----------------------------
1001.01      100    alice
1002.02       50      bob
1003.03      200      777

      

So be aware, I don't want b_col in the final result. How can I do this using Pandas?

Using SQL, I probably should have done something like:

SELECT t1.*, t2.a_col FROM table_1 as t1
                      LEFT JOIN table_2 as t2
                      ON t1.EntityNum = t2.EntityNum; 

      

Search

I know that merge can be used. This is what I tried:

import pandas as pd

df_a = pd.read_csv(path_a, sep=',')
df_b = pd.read_csv(path_b, sep=',')
df_c = pd.merge(df_a, df_b, on='EntityNumber')

      

But I got stuck when it came to avoiding some unwanted columns in the final frame.

+1


source to share


2 answers


You can first access the relevant data columns of the data through your labels (for example, df_a[['EntityNum', 'foo']]

and then join them.

df_a[['EntityNum', 'foo']].merge(df_b[['EntityNum', 'a_col']], on='EntityNum', how='left')

      



Note that the default behavior for merge

is inner join.

+1


source


Notice how in SQL you will do the join first and then select the columns you want. In a similar vein, you can do a full join in Pandas and then select the columns you want.

Alternatively do a full join and del

columns you don't want.



Finally, you can first select the columns you ant and then do the join.

0


source







All Articles