Pandas: how to convert a list to a matrix grouped by column?

I have a pandas framework where the first column (CUSTOMER) is the customer name and the customer name is repeated once for every product purchased by the customer (PRODUCT):

Customer  Product  Count
John      A        1
John      B        1
John      C        1
Mary      A        1
Mary      B        1
Charles   A        1

      

I want to expand this data to create a new dataframe where both rows and columns are the product category (PRODUCT) and the values ​​are the customer name account as shown below:

Product
       A     B     C
A      0     2     1
B      2     0     1
C      1     1     0

      

So, if John bought A and also bought B, +1 would be added to cell A: B, he also bought A in combination with C, so there is +1 in cell A: C, etc., Note that Charles does not appear in this data area because he only bought one product.

I tried using pandas.pivot_table, but this is what I got:

df = pd.pivot_table(df, index=['Product'], columns=['Product'], values=['Customer'])

>> KeyError: 'Level Product not found'

      

Which method and parameters should you use?

+3


source to share


1 answer


Self merge

withcrosstab

d1 = df.merge(df, on='Customer').query('Product_x != Product_y')
pd.crosstab(d1.Product_x, d1.Product_y)

Product_y  A  B  C
Product_x         
A          0  2  1
B          2  0  1
C          1  1  0

      




You can see this answer for a better understanding of how to speed crosstab

up. The key concept for this problem was self-consistency.

+6


source







All Articles