Fastest way to set Pandas Dataframe elements based on function with index and column value as input

I have one Pandas dataframe column:

s = 
      VALUE
INDEX
A     12
B     21
C     7
...
Y     21
Z     7

      

I want to render it into a square matrix mask with the same index and columns as and s.index

, with each element either True

if the column and index value is the same in s

or False

otherwise.

mask = 
      A     B     C ...      Y     Z 
A  True False False ...  False False
B False  True False ...   True False
C False False  True ...  False  True
...
Y False  True False ...   True False
Z False False  True ...  False  True

      

My actual one s

has 10K + lines. What's the fastest way to create this mask

DataFrame?

One way I tried is to create a two-level dictionary with two loops for

. (for example, dict['A']['B'] = dict['B']['A'] = True if s.loc['A'] == s.loc['B'] else False

etc.). Then convert the bottom layer dict

to a Pandas series (for example row = pd.Series(dict[A])

), then add that series to mask

. mask

line by line is repeated line by line.

This is very time consuming and must obviously go through 10K x 10K / 2 = 50M elements ... Not ideal?

+3


source to share


1 answer


Use numpy

broadcast



v = s.VALUE.values
pd.DataFrame(v == v[:, None], s.index, s.index)

INDEX      A      B      C      Y      Z
INDEX                                   
A       True  False  False  False  False
B      False   True  False   True  False
C      False  False   True  False   True
Y      False   True  False   True  False
Z      False  False   True  False   True

      

+3


source







All Articles