Remove rows from two DFs that have unusual column meaning
I have these two DFs
Active:
Customer_ID | product_No| Rating
7 | 111 | 3.0
7 | 222 | 1.0
7 | 333 | 5.0
7 | 444 | 3.0
User:
Customer_ID | product_No| Rating
9 | 111 | 2.0
9 | 222 | 5.0
9 | 666 | 5.0
9 | 555 | 3.0
I want to search for ratings of general products that users have rated (eg 111,222) and have removed any unusual products (eg 444 333 556 666). So the new DFs should be like this:
Active:
Customer_ID | product_No| Rating
7 | 111 | 3.0
7 | 222 | 1.0
User:
Customer_ID | product_No| Rating
9 | 111 | 2.0
9 | 222 | 5.0
I don't know how to do this without loops. Can you help me please
This is the code I have so far:
import pandas as pd
ratings = pd.read_csv("ratings.csv",names['Customer_ID','product_No','Rating'])
active=ratings[ratings['UserID']==7]
user=ratings[ratings['UserID']==9]
source to share
You can get the general one first product_No
with set intersection and then use the method isin
to filter the original data frames:
common_product = set(active.product_No).intersection(user.product_No)
common_product
# {111, 222}
active[active.product_No.isin(common_product)]
#Customer_ID product_No Rating
#0 7 111 3.0
#1 7 222 1.0
user[user.product_No.isin(common_product)]
#Customer_ID product_No Rating
#0 9 111 2.0
#1 9 222 5.0
source to share
I tried it with the INNER JOIN
following:
import pandas as pd
df1 = pd.read_csv('a.csv')
df2 = pd.read_csv('b.csv')
print df1
print df2
df_ij = pd.merge(df1, df2, on='product_No', how='inner')
print df_ij
df_list = []
for df_e,suffx in zip([df1,df2],['_x','_y']):
df_e = df_ij[['Customer_ID'+suffx,'product_No','Rating'+suffx]]
df_e.columns = list(df1)
df_list.append(df_e)
print df_list[0]
print df_list[1]
It gives the following output:
# print df1
Customer_ID product_No Rating
0 7 111 3
1 7 222 1
2 7 333 5
3 7 444 3
# print df2
Customer_ID product_No Rating
0 9 111 2
1 9 222 5
2 9 777 5
3 9 555 3
# print the INNER JOINed df
Customer_ID_x product_No Rating_x Customer_ID_y Rating_y
0 7 111 3 9 2
1 7 222 1 9 5
# print the first df you want, with common 'product_No'
Customer_ID product_No Rating
0 7 111 3
1 7 222 1
# print the second df you want, with common 'product_No'
Customer_ID product_No Rating
0 9 111 2
1 9 222 5
INNER JOIN
selects common lines in each df
. Since there are common column names, for columns not used in the join, the combined df
added suffixes to distinguish between the names of these columns. Then you just need to extract the columns to get the end result you want, just by specifying the appropriate suffix.
Here's a good example INNER JOIN
here .
source to share
Your answer to this question ....
import pandas as pd
dict1={"Customer_id":[7,7,7,7],
"Product_No":[111,222,333,444],
"rating":[3.0,1.0,5.0,3.0]}
active=pd.DataFrame(dict1)
dict2={"Customer_id":[9,9,9,9],
"Product_No":[111,222,666,555],
"rating":[2.0,5.0,5.0,3.0]}
user=pd.DataFrame(dict2)
df3=pd.merge(active,user,on="Product_No",how="inner")
df3
active=df3[["Customer_id_x","Product_No","rating_x"]]
print(active)
user=df3[["Customer_id_y","Product_No","rating_y"]]
print(user)
source to share