In pandas, how to concatenate horizontally and then remove redundant columns

Let's say I have two data frames.

DF1: col1, col2, col3,

DF2: col2, col4, col5

How to combine two dataframes horizontally and have col1, col2, col3, col4 and col5? Right now, I am doing pd.concat ([DF1, DF2], axis = 1), but I end up with two col2. If all the values ​​inside two col2 are the same, I only want to have one column.

+3


source to share


4 answers


Duplicate removal should work. Since drop_duplicates only works on the index, we need to transfer the DF to drop duplicates and transfer it back.



pd.concat([DF1, DF2], axis = 1).T.drop_duplicates().T

      

+3


source


Use difference

for columns from DF2

that are not in DF1

, and just select them []

:

DF1 = pd.DataFrame(columns=['col1', 'col2', 'col3'])
DF2 = pd.DataFrame(columns=['col2', 'col4', 'col5'])


DF2 = DF2[DF2.columns.difference(DF1.columns)]
print (DF2)
Empty DataFrame
Columns: [col4, col5]
Index: []

print (pd.concat([DF1, DF2], axis = 1))
Empty DataFrame
Columns: [col1, col2, col3, col4, col5]
Index: []

      

Delay



np.random.seed(123)

N = 1000
DF1 = pd.DataFrame(np.random.rand(N,3), columns=['col1', 'col2', 'col3'])
DF2 = pd.DataFrame(np.random.rand(N,3), columns=['col2', 'col4', 'col5'])

DF2['col2'] = DF1['col2']

In [408]: %timeit (pd.concat([DF1, DF2], axis = 1).T.drop_duplicates().T)
10 loops, best of 3: 122 ms per loop

In [409]: %timeit (pd.concat([DF1, DF2[DF2.columns.difference(DF1.columns)]], axis = 1))
1000 loops, best of 3: 979 Β΅s per loop

      


N = 10000:
In [411]: %timeit (pd.concat([DF1, DF2], axis = 1).T.drop_duplicates().T)
1 loop, best of 3: 1.4 s per loop

In [412]: %timeit (pd.concat([DF1, DF2[DF2.columns.difference(DF1.columns)]], axis = 1))
1000 loops, best of 3: 1.12 ms per loop

      

+3


source


DF2.drop(DF2.columns[DF2.columns.isin(DF1.columns)],axis=1,inplace=True)

      

Then

pd.concat([DF1, DF2], axis = 1)

      

0


source


you can proceed like this:

- first remove col2 from DF2

- then concatenate the two data frames

-2


source







All Articles