Python - binary encoding a comma separated string column
Can anyone help me with binary encoding data that looks like this:
df = pd.DataFrame({'_id': [1,2,3],
'test': ['one,two,three', 'one,two', 'two']})
print(df)
_id test
0 1 one,two,three
1 2 one,two
2 3 two
here:
df_result = pd.DataFrame({'id': [1,2,3],
'one': [1,1,0],
'two': [1,1,1],
'three': [1,0,0]})
print(df_result)
id one three two
0 1 1 1 1
1 2 1 0 1
2 3 0 0 1
Any help would be much appreciated! Thanks to
+3
source to share
1 answer
Use str.get_dummies()
In [58]: df.test.str.get_dummies(',')
Out[58]:
one three two
0 1 1 1
1 1 0 1
2 0 0 1
Use the join
result for the original if necessary.
In [62]: df.join(df.test.str.get_dummies(','))
Out[62]:
_id test one three two
0 1 one,two,three 1 1 1
1 2 one,two 1 0 1
2 3 two 0 0 1
Or, pd.concat
In [63]: pd.concat([df, df.test.str.get_dummies(',')], axis=1)
Out[63]:
_id test one three two
0 1 one,two,three 1 1 1
1 2 one,two 1 0 1
2 3 two 0 0 1
+5
source to share