How to simplify the data merging process?

Question

How to simplify the data merging process?

I have multiple data frames with columns: coupon_id

and rating

. I want to combine this data and get one dataframe with all coupon_id

and rating

as the sum of all estimates for this coupon_id

in all data files.

For example. Let's say I have 2 data frames:

| coupon_id | rating |
|: ----------- | ------------: |
| 1 | 40 |     
| 2 | 60 |    
| 3 | 50 |

| coupon_id | rating |
|: ----------- | ------------: |
| 4 | 70 |     
| 2 | 80 |    
| 3 | 60 |

As a result, I want to get this dataframe:

| coupon_id | rating |
|: ----------- | ------------: |
| 1 | 40 |     
| 2 | 140 |    
| 3 | 110 |
| 4 | 70 |

For this problem I am using this code, it works but it is inefficient

similar_users_ratings = pd.DataFrame (columns = ['coupon_id', 'rating'])

    for similarUser in most_similar_users:
        similar_user_ratings = self.ratingData.loc [self.ratingData ['patient_id'] == similarUser [0],:]. copy ()

        similar_user_ratings.loc [:, 'rating'] = similar_user_ratings.loc [:, 'rating']. apply (lambda x: int (x) * similarUser [1])
        del similar_user_ratings ['patient_id']
        similar_users_ratings = similar_users_ratings.merge (similar_user_ratings, on = 'coupon_id', how = 'outer')
        similar_users_ratings ['rating_y']. fillna (.0, inplace = True)
        similar_users_ratings ['rating_x']. fillna (.0, inplace = True)
        similar_users_ratings ['rating'] = similar_users_ratings ['rating_x'] + similar_users_ratings ['rating_y']
        del similar_users_ratings ['rating_y']
        del similar_users_ratings ['rating_x']

How can I simplify this piece of code? Thank.

Actually I have several dataframes like:

      coupon_id rating
69 12 1

      coupon_id rating
101 37 1

      coupon_id rating
428 11 1

      coupon_id rating
1133 11 1

Desired dataset:

 coupon_id rating
     12 1
     37 1
     11 2

+3

python python-2.7 pandas dataframe

Vlad Shut May 12 '17 at 10:52

source to share

1 answer

MaxU · Accepted Answer · 2017-05-12T11:10:06+0000

UPDATE:

In [46]: d1
Out[46]:
    coupon_id  rating
69         12       1

In [47]: d2
Out[47]:
     coupon_id  rating
101         37       1

In [48]: d3
Out[48]:
     coupon_id  rating
428         11       1

In [49]: d4
Out[49]:
      coupon_id  rating
1133         11       1

In [50]: pd.concat([d1,d2,d3,d4],ignore_index=True).groupby('coupon_id', as_index=False)['rating'].sum(
Out[50]:
   coupon_id  rating
0         11       2
1         12       1
2         37       1

OLD answer:

In [219]: d1.set_index('coupon_id').add(d2.set_index('coupon_id'), fill_value=0) \
            .reset_index()
Out[219]:
   coupon_id  rating
0          1    40.0
1          2   140.0
2          3   110.0
3          4    70.0

How to simplify the data merging process?

More articles: