Split Pandas series without multi-index

I would like to take a Pandas series with a sibling index and split by that index into a multi-column framework. For example, to enter:

s = pd.Series(range(10,17), index=['a','a','b','b','c','c','c'])

s
a    10
a    11
b    12
b    13
c    14
c    15
c    16
dtype: int64

      

What I would like as an output:

    a    b    c
0   10   12   14
1   11   13   15
2   NaN  NaN  16

      

I cannot directly use the unstack command because it requires multiindex and I only have a single level index. I tried to insert a dummy index that had the same value, but I got the error "ReshapeError: Index contains duplicate entries, cannot be changed."

I know this is a bit unusual because 1) Pandas doesn't like dangling arrays, so it will need to be filled, 2) the index should be arbitrarily reset, 3) t really "initialize" the dataframe until I know how long the long will be column. But it still seems like something I have to do somehow. I also thought about doing this with groupby, but it looks like there is nothing like grouped_df.values ​​() without some kind of aggregation function - perhaps for the above reasons.

+2


source to share


2 answers


You can use the groupby

, apply

, reset_index

to create a multi-series, and then call unstack

:

import pandas as pd
s = pd.Series(range(10,17), index=['a','a','b','b','c','c','c'])
df = s.groupby(level=0).apply(pd.Series.reset_index, drop=True).unstack(0)
print df

      



output:

   a   b   c
0  10  12  14
1  11  13  15
2 NaN NaN  16

      

+2


source


Not sure how general this generalizes. I call this the concatenated pattern group. Essentially applied, but with control over how exactly it is combined.



In [24]: s = pd.Series(range(10,17), index=['a','a','b','b','c','c','c'])

In [25]: df = DataFrame(dict(key = s.index, value = s.values))

In [26]: df
Out[26]: 
  key  value
0   a     10
1   a     11
2   b     12
3   b     13
4   c     14
5   c     15
6   c     16

In [27]: concat(dict([ (g,Series(grp['value'].values)) for g, grp in df.groupby('key') ]),axis=1)
Out[27]: 
    a   b   c
0  10  12  14
1  11  13  15
2 NaN NaN  16

      

0


source







All Articles