How to simplify the data merging process?
I have multiple data frames with columns: coupon_id
and rating
. I want to combine this data and get one dataframe with all coupon_id
and rating
as the sum of all estimates for this coupon_id
in all data files.
For example. Let's say I have 2 data frames:
| coupon_id | rating | |: ----------- | ------------: | | 1 | 40 | | 2 | 60 | | 3 | 50 |
| coupon_id | rating | |: ----------- | ------------: | | 4 | 70 | | 2 | 80 | | 3 | 60 |
As a result, I want to get this dataframe:
| coupon_id | rating | |: ----------- | ------------: | | 1 | 40 | | 2 | 140 | | 3 | 110 | | 4 | 70 |
For this problem I am using this code, it works but it is inefficient
similar_users_ratings = pd.DataFrame (columns = ['coupon_id', 'rating']) for similarUser in most_similar_users: similar_user_ratings = self.ratingData.loc [self.ratingData ['patient_id'] == similarUser [0],:]. copy () similar_user_ratings.loc [:, 'rating'] = similar_user_ratings.loc [:, 'rating']. apply (lambda x: int (x) * similarUser [1]) del similar_user_ratings ['patient_id'] similar_users_ratings = similar_users_ratings.merge (similar_user_ratings, on = 'coupon_id', how = 'outer') similar_users_ratings ['rating_y']. fillna (.0, inplace = True) similar_users_ratings ['rating_x']. fillna (.0, inplace = True) similar_users_ratings ['rating'] = similar_users_ratings ['rating_x'] + similar_users_ratings ['rating_y'] del similar_users_ratings ['rating_y'] del similar_users_ratings ['rating_x']
How can I simplify this piece of code? Thank.
Actually I have several dataframes like:
coupon_id rating 69 12 1 coupon_id rating 101 37 1 coupon_id rating 428 11 1 coupon_id rating 1133 11 1
Desired dataset:
coupon_id rating 12 1 37 1 11 2
+3
source to share
1 answer
UPDATE:
In [46]: d1
Out[46]:
coupon_id rating
69 12 1
In [47]: d2
Out[47]:
coupon_id rating
101 37 1
In [48]: d3
Out[48]:
coupon_id rating
428 11 1
In [49]: d4
Out[49]:
coupon_id rating
1133 11 1
In [50]: pd.concat([d1,d2,d3,d4],ignore_index=True).groupby('coupon_id', as_index=False)['rating'].sum(
Out[50]:
coupon_id rating
0 11 2
1 12 1
2 37 1
OLD answer:
In [219]: d1.set_index('coupon_id').add(d2.set_index('coupon_id'), fill_value=0) \
.reset_index()
Out[219]:
coupon_id rating
0 1 40.0
1 2 140.0
2 3 110.0
3 4 70.0
+1
source to share