Pandas: how to convert a list to a matrix grouped by column?
I have a pandas framework where the first column (CUSTOMER) is the customer name and the customer name is repeated once for every product purchased by the customer (PRODUCT):
Customer Product Count
John A 1
John B 1
John C 1
Mary A 1
Mary B 1
Charles A 1
I want to expand this data to create a new dataframe where both rows and columns are the product category (PRODUCT) and the values ββare the customer name account as shown below:
Product
A B C
A 0 2 1
B 2 0 1
C 1 1 0
So, if John bought A and also bought B, +1 would be added to cell A: B, he also bought A in combination with C, so there is +1 in cell A: C, etc., Note that Charles does not appear in this data area because he only bought one product.
I tried using pandas.pivot_table, but this is what I got:
df = pd.pivot_table(df, index=['Product'], columns=['Product'], values=['Customer'])
>> KeyError: 'Level Product not found'
Which method and parameters should you use?
source to share
Self merge
withcrosstab
d1 = df.merge(df, on='Customer').query('Product_x != Product_y')
pd.crosstab(d1.Product_x, d1.Product_y)
Product_y A B C
Product_x
A 0 2 1
B 2 0 1
C 1 1 0
You can see this answer for a better understanding of how to speed crosstab
up. The key concept for this problem was self-consistency.
source to share