Split Pandas series without multi-index
I would like to take a Pandas series with a sibling index and split by that index into a multi-column framework. For example, to enter:
s = pd.Series(range(10,17), index=['a','a','b','b','c','c','c'])
s
a 10
a 11
b 12
b 13
c 14
c 15
c 16
dtype: int64
What I would like as an output:
a b c
0 10 12 14
1 11 13 15
2 NaN NaN 16
I cannot directly use the unstack command because it requires multiindex and I only have a single level index. I tried to insert a dummy index that had the same value, but I got the error "ReshapeError: Index contains duplicate entries, cannot be changed."
I know this is a bit unusual because 1) Pandas doesn't like dangling arrays, so it will need to be filled, 2) the index should be arbitrarily reset, 3) t really "initialize" the dataframe until I know how long the long will be column. But it still seems like something I have to do somehow. I also thought about doing this with groupby, but it looks like there is nothing like grouped_df.values () without some kind of aggregation function - perhaps for the above reasons.
source to share
You can use the groupby
, apply
, reset_index
to create a multi-series, and then call unstack
:
import pandas as pd
s = pd.Series(range(10,17), index=['a','a','b','b','c','c','c'])
df = s.groupby(level=0).apply(pd.Series.reset_index, drop=True).unstack(0)
print df
output:
a b c
0 10 12 14
1 11 13 15
2 NaN NaN 16
source to share
Not sure how general this generalizes. I call this the concatenated pattern group. Essentially applied, but with control over how exactly it is combined.
In [24]: s = pd.Series(range(10,17), index=['a','a','b','b','c','c','c'])
In [25]: df = DataFrame(dict(key = s.index, value = s.values))
In [26]: df
Out[26]:
key value
0 a 10
1 a 11
2 b 12
3 b 13
4 c 14
5 c 15
6 c 16
In [27]: concat(dict([ (g,Series(grp['value'].values)) for g, grp in df.groupby('key') ]),axis=1)
Out[27]:
a b c
0 10 12 14
1 11 13 15
2 NaN NaN 16
source to share