Remove rows from two DFs that have unusual column meaning

I have these two DFs

Active:

Customer_ID | product_No| Rating
7           | 111       | 3.0
7           | 222       | 1.0
7           | 333       | 5.0
7           | 444       | 3.0

      

User:

Customer_ID | product_No| Rating
9           | 111       | 2.0
9           | 222       | 5.0
9           | 666       | 5.0
9           | 555       | 3.0

      

I want to search for ratings of general products that users have rated (eg 111,222) and have removed any unusual products (eg 444 333 556 666). So the new DFs should be like this:

Active:

Customer_ID | product_No| Rating
7           | 111       | 3.0
7           | 222       | 1.0

      

User:

Customer_ID | product_No| Rating
9           | 111       | 2.0
9           | 222       | 5.0

      

I don't know how to do this without loops. Can you help me please

This is the code I have so far:

import pandas as pd
ratings = pd.read_csv("ratings.csv",names['Customer_ID','product_No','Rating'])
active=ratings[ratings['UserID']==7]
user=ratings[ratings['UserID']==9]

      

+3


source to share


4 answers


You can get the general one first product_No

with set intersection and then use the method isin

to filter the original data frames:



common_product = set(active.product_No).intersection(user.product_No)

common_product
# {111, 222}

active[active.product_No.isin(common_product)]

#Customer_ID   product_No   Rating
#0         7          111      3.0
#1         7          222      1.0

user[user.product_No.isin(common_product)]

#Customer_ID   product_No   Rating
#0         9          111      2.0
#1         9          222      5.0

      

+4


source


Use query

by referencing other data frames



Active.query('product_No in @User.product_No')

   Customer_ID  product_No  Rating
0            7         111     3.0
1            7         222     1.0

User.query('product_No in @Active.product_No')

   Customer_ID  product_No  Rating
0            9         111     2.0
1            9         222     5.0

      

+1


source


I tried it with the INNER JOIN

following:

import pandas as pd

df1 = pd.read_csv('a.csv')
df2 = pd.read_csv('b.csv')
print df1
print df2

df_ij = pd.merge(df1, df2, on='product_No', how='inner')
print df_ij

df_list = []
for df_e,suffx in zip([df1,df2],['_x','_y']):
    df_e = df_ij[['Customer_ID'+suffx,'product_No','Rating'+suffx]]
    df_e.columns = list(df1)
    df_list.append(df_e)

print df_list[0]
print df_list[1]

      

It gives the following output:

# print df1
   Customer_ID  product_No  Rating
0            7         111       3
1            7         222       1
2            7         333       5
3            7         444       3

# print df2
   Customer_ID  product_No  Rating
0            9         111       2
1            9         222       5
2            9         777       5
3            9         555       3

# print the INNER JOINed df
   Customer_ID_x  product_No  Rating_x  Customer_ID_y  Rating_y
0              7         111         3              9         2
1              7         222         1              9         5

# print the first df you want, with common 'product_No'
   Customer_ID  product_No  Rating
0            7         111       3
1            7         222       1

# print the second df you want, with common 'product_No'
   Customer_ID  product_No  Rating
0            9         111       2
1            9         222       5

      

INNER JOIN

selects common lines in each df

. Since there are common column names, for columns not used in the join, the combined df

added suffixes to distinguish between the names of these columns. Then you just need to extract the columns to get the end result you want, just by specifying the appropriate suffix.

Here's a good example INNER JOIN

here .

0


source


Your answer to this question ....

import pandas as pd
dict1={"Customer_id":[7,7,7,7],
      "Product_No":[111,222,333,444],
      "rating":[3.0,1.0,5.0,3.0]}
active=pd.DataFrame(dict1)
dict2={"Customer_id":[9,9,9,9],
      "Product_No":[111,222,666,555],
      "rating":[2.0,5.0,5.0,3.0]}
user=pd.DataFrame(dict2)
df3=pd.merge(active,user,on="Product_No",how="inner")
df3
active=df3[["Customer_id_x","Product_No","rating_x"]]
print(active)
user=df3[["Customer_id_y","Product_No","rating_y"]]
print(user)

      

0


source







All Articles