Compare two pandas frames with different size

I have one massive pandas dataframe with this structure:

df1:
    A   B
0   0  12
1   0  15
2   0  17
3   0  18
4   1  45
5   1  78
6   1  96
7   1  32
8   2  45
9   2  78
10  2  44
11  2  10

      

And the second, less:

df2
   G   H
0  0  15
1  1  45
2  2  31

      

I want to add a column to my first dataframe after this rule: column df1.C = df2.H when df1.A == df2.G

I manage to do it with loops, but the database is massive and the code is very slow, so I'm looking for Pandas -way or numpy for that.

Many thanks,

Boris

+3


source to share


3 answers


You probably want to use merge:

df=df1.merge(df2,left_on="A",right_on="G")

      

will give you a 3 column framework but the third name will be H



df.columns=["A","B","C"]

      

then will give you the column names you want

+1


source


You can use the created one :map

Series

set_index

df1['C'] = df1['A'].map(df2.set_index('G')['H'])
print (df1)
    A   B   C
0   0  12  15
1   0  15  15
2   0  17  15
3   0  18  15
4   1  45  45
5   1  78  45
6   1  96  45
7   1  32  45
8   2  45  31
9   2  78  31
10  2  44  31
11  2  10  31

      



Or merge

with drop

and rename

:

df = df1.merge(df2,left_on="A",right_on="G", how='left')
        .drop('G', axis=1)
        .rename(columns={'H':'C'})
print (df)
    A   B   C
0   0  12  15
1   0  15  15
2   0  17  15
3   0  18  15
4   1  45  45
5   1  78  45
6   1  96  45
7   1  32  45
8   2  45  31
9   2  78  31
10  2  44  31
11  2  10  31

      

0


source


Here's one vector NumPy approach -

idx = np.searchsorted(df2.G.values, df1.A.values)
df1['C'] = df2.H.values[idx]

      

idx

could have been easier to compute with df2.G.searchsorted(df1.A)

, but I don't think it would be more efficient, because we want to use the underlying array with .values

for performance, as we did earlier.

0


source







All Articles