Fastest way to set Pandas Dataframe elements based on function with index and column value as input
I have one Pandas dataframe column:
s =
VALUE
INDEX
A 12
B 21
C 7
...
Y 21
Z 7
I want to render it into a square matrix mask with the same index and columns as and s.index
, with each element either True
if the column and index value is the same in s
or False
otherwise.
mask =
A B C ... Y Z
A True False False ... False False
B False True False ... True False
C False False True ... False True
...
Y False True False ... True False
Z False False True ... False True
My actual one s
has 10K + lines. What's the fastest way to create this mask
DataFrame?
One way I tried is to create a two-level dictionary with two loops for
. (for example, dict['A']['B'] = dict['B']['A'] = True if s.loc['A'] == s.loc['B'] else False
etc.). Then convert the bottom layer dict
to a Pandas series (for example row = pd.Series(dict[A])
), then add that series to mask
. mask
line by line is repeated line by line.
This is very time consuming and must obviously go through 10K x 10K / 2 = 50M elements ... Not ideal?
source to share