Pandas - separator column contains strings and lists

I have a dataframe that has a column that contains rows on some rows and lists on some rows. how can i decompose the list into separate colummns. This is what it is -

>>> df2 = pd.DataFrame(["abc","[u'abc', u'xyz']"])
>>> df2

                  0
0               abc
1  [u'abc', u'xyz']

      

I would like to get to this -

     0     1
0  abc  None
1  abc   xyz

      

I tried something like this, but there are problems with it -

>>> for col, col_data in df2.iteritems():
...   col_data = pd.get_dummies(pd.DataFrame(list(col_data)), prefix = col)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/remote/iims003/harpreet/anaconda2/lib/python2.7/site-packages/pandas/core/reshape.py", line 1095, in get_dummies
    for (col, pre, sep) in zip(columns_to_encode, prefix, prefix_sep):
TypeError: izip argument #2 must support iteration

      

+3


source to share


1 answer


You can use an app that returns a series:



In [11]: from ast import literal_eval

In [12]: def to_series(s):
    ...:     try:
    ...:         return pd.Series(literal_eval(s))  # makes it an actual list
    ...:     except ValueError:
    ...:         return pd.Series([s])
    ...:

In [13]: df2[0].apply(to_series)
Out[13]:
     0    1
0  abc  NaN
1  abc  xyz

      

0


source







All Articles