Pandas: how to convert a list to a matrix grouped by column?

Question

Pandas: how to convert a list to a matrix grouped by column?

I have a pandas framework where the first column (CUSTOMER) is the customer name and the customer name is repeated once for every product purchased by the customer (PRODUCT):

Customer  Product  Count
John      A        1
John      B        1
John      C        1
Mary      A        1
Mary      B        1
Charles   A        1

I want to expand this data to create a new dataframe where both rows and columns are the product category (PRODUCT) and the values are the customer name account as shown below:

Product
       A     B     C
A      0     2     1
B      2     0     1
C      1     1     0

So, if John bought A and also bought B, +1 would be added to cell A: B, he also bought A in combination with C, so there is +1 in cell A: C, etc., Note that Charles does not appear in this data area because he only bought one product.

I tried using pandas.pivot_table, but this is what I got:

df = pd.pivot_table(df, index=['Product'], columns=['Product'], values=['Customer'])

>> KeyError: 'Level Product not found'

Which method and parameters should you use?

+3

python numpy pandas

syrup 07 June 17 at 17:47

source to share

1 answer

piRSquared · Accepted Answer · 2017-06-07T18:19:47+0000

Self merge

withcrosstab

d1 = df.merge(df, on='Customer').query('Product_x != Product_y')
pd.crosstab(d1.Product_x, d1.Product_y)

Product_y  A  B  C
Product_x         
A          0  2  1
B          2  0  1
C          1  1  0

You can see this answer for a better understanding of how to speed crosstab

up. The key concept for this problem was self-consistency.

Pandas: how to convert a list to a matrix grouped by column?

More articles: