Compare two pandas frames with different size

Question

Compare two pandas frames with different size

I have one massive pandas dataframe with this structure:

And the second, less:

I want to add a column to my first dataframe after this rule: column df1.C = df2.H when df1.A == df2.G

I manage to do it with loops, but the database is massive and the code is very slow, so I'm looking for Pandas -way or numpy for that.

Many thanks,

Boris

+3

python numpy pandas

boris 07 June 17 at 14:02

source to share

3 answers

You can use the created one :map

Series

set_index

df1['C'] = df1['A'].map(df2.set_index('G')['H'])
print (df1)
    A   B   C
0   0  12  15
1   0  15  15
2   0  17  15
3   0  18  15
4   1  45  45
5   1  78  45
6   1  96  45
7   1  32  45
8   2  45  31
9   2  78  31
10  2  44  31
11  2  10  31

Or merge

with drop

and rename

:

df = df1.merge(df2,left_on="A",right_on="G", how='left')
        .drop('G', axis=1)
        .rename(columns={'H':'C'})
print (df)
    A   B   C
0   0  12  15
1   0  15  15
2   0  17  15
3   0  18  15
4   1  45  45
5   1  78  45
6   1  96  45
7   1  32  45
8   2  45  31
9   2  78  31
10  2  44  31
11  2  10  31

0

jezrael 07 June 17 at 14:06

source to share

Here's one vector NumPy approach -

idx = np.searchsorted(df2.G.values, df1.A.values)
df1['C'] = df2.H.values[idx]

idx

could have been easier to compute with df2.G.searchsorted(df1.A)

, but I don't think it would be more efficient, because we want to use the underlying array with .values

for performance, as we did earlier.

0

Divakar 07 June 17 at 14:10

source to share

WNG · Accepted Answer · 2017-06-07T14:05:02+0000

You probably want to use merge:

df=df1.merge(df2,left_on="A",right_on="G")

will give you a 3 column framework but the third name will be H

df.columns=["A","B","C"]

then will give you the column names you want

Compare two pandas frames with different size

More articles: