Python - binary encoding a comma separated string column

Can anyone help me with binary encoding data that looks like this:

df = pd.DataFrame({'_id': [1,2,3],
                   'test': ['one,two,three', 'one,two', 'two']})

print(df)

   _id           test
0    1  one,two,three
1    2        one,two
2    3            two

      

here:

df_result = pd.DataFrame({'id': [1,2,3],
                          'one': [1,1,0],
                          'two': [1,1,1],
                          'three': [1,0,0]})
print(df_result)

   id  one  three  two
0   1    1      1    1
1   2    1      0    1
2   3    0      0    1

      

Any help would be much appreciated! Thanks to

+3


source to share


1 answer


Use str.get_dummies()

In [58]: df.test.str.get_dummies(',')
Out[58]:
   one  three  two
0    1      1    1
1    1      0    1
2    0      0    1

      

Use the join

result for the original if necessary.



In [62]: df.join(df.test.str.get_dummies(','))
Out[62]:
   _id           test  one  three  two
0    1  one,two,three    1      1    1
1    2        one,two    1      0    1
2    3            two    0      0    1

      

Or, pd.concat

In [63]: pd.concat([df, df.test.str.get_dummies(',')], axis=1)
Out[63]:
   _id           test  one  three  two
0    1  one,two,three    1      1    1
1    2        one,two    1      0    1
2    3            two    0      0    1

      

+5


source







All Articles