Pandas: Pivot to True / False, drop column

I'm trying to create what I consider to be a simple pivot table, but I'm having serious problems. There are two things that I cannot do:

  • Get rid of the "partner" column at the end.
  • Set to True or False if every company has this partner.

Setting:

df = pd.DataFrame({'company':['a','b','c','b'], 'partner':['x','x','y','y'], 'str':['just','some','random','words']})

      

Desired output:

company     x      y  
a        True  False
b        True   True
c       False   True

      

I started with:

df = df.pivot(values = 'partner', columns = 'partner', index = 'company').reset_index()

      

which shuts me down, but when I try to get rid of the "partner" column, I can't even reference it, and it's not "index".

In the second problem, I can use:

df.fillna(False, inplace = True)
df.loc[~(df['x'] == False), 'x'] = True
df.loc[~(df['y'] == False), 'y'] = True

      

but that seems incredibly hacky. Any help would be appreciated.

+1


source to share


1 answer


Option 1

df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool)


partner      x      y
company              
a         True  False
b         True   True
c        False   True

      

Get rid of names in object columns

df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool) \
    .rename_axis(None, 1).reset_index()

  company      x      y
0       a   True  False
1       b   True   True
2       c  False   True

      

Option 2

pd.crosstab(df.company, df.partner).astype(bool)

partner      x      y
company              
a         True  False
b         True   True
c        False   True


pd.crosstab(df.company, df.partner).astype(bool) \
    .rename_axis(None, 1).reset_index()

  company      x      y
0       a   True  False
1       b   True   True
2       c  False   True

      



Option 3

f1, u1 = pd.factorize(df.company.values)
f2, u2 = pd.factorize(df.partner.values)
n, m = u1.size, u2.size

b = np.bincount(f1 * m + f2)
pad = np.zeros(n * m - b.size, dtype=int)
b = np.append(b, pad)
v = b.reshape(n, m).astype(bool)

pd.DataFrame(np.column_stack([u1, v]), columns=np.append('company', u2))

  company      x      y
0       a   True  False
1       b   True   True
2       c  False   True

      


Timing
small data

%timeit df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool).rename_axis(None, 1).reset_index()
%timeit pd.crosstab(df.company, df.partner).astype(bool).rename_axis(None, 1).reset_index()

%%timeit
f1, u1 = pd.factorize(df.company.values)
f2, u2 = pd.factorize(df.partner.values)
n, m = u1.size, u2.size

b = np.bincount(f1 * m + f2)
pad = np.zeros(n * m - b.size, dtype=int)
b = np.append(b, pad)
v = b.reshape(n, m).astype(bool)

pd.DataFrame(np.column_stack([u1, v]), columns=np.append('company', u2))

1000 loops, best of 3: 1.67 ms per loop
100 loops, best of 3: 5.97 ms per loop
1000 loops, best of 3: 301 ยตs per loop

      

+3


source







All Articles