Pandas - separator column contains strings and lists
I have a dataframe that has a column that contains rows on some rows and lists on some rows. how can i decompose the list into separate colummns. This is what it is -
>>> df2 = pd.DataFrame(["abc","[u'abc', u'xyz']"])
>>> df2
0
0 abc
1 [u'abc', u'xyz']
I would like to get to this -
0 1
0 abc None
1 abc xyz
I tried something like this, but there are problems with it -
>>> for col, col_data in df2.iteritems():
... col_data = pd.get_dummies(pd.DataFrame(list(col_data)), prefix = col)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/remote/iims003/harpreet/anaconda2/lib/python2.7/site-packages/pandas/core/reshape.py", line 1095, in get_dummies
for (col, pre, sep) in zip(columns_to_encode, prefix, prefix_sep):
TypeError: izip argument #2 must support iteration
+3
source to share
1 answer
You can use an app that returns a series:
In [11]: from ast import literal_eval
In [12]: def to_series(s):
...: try:
...: return pd.Series(literal_eval(s)) # makes it an actual list
...: except ValueError:
...: return pd.Series([s])
...:
In [13]: df2[0].apply(to_series)
Out[13]:
0 1
0 abc NaN
1 abc xyz
0
source to share