Rotate pandas dataframe to prefixed cols, not MultiIndex
I have a temporary framework similar to:
ts = pd.DataFrame([['Jan 2000','WidgetCo',0.5, 2], ['Jan 2000','GadgetCo',0.3, 3], ['Jan 2000','SnazzyCo',0.2, 4],
['Feb 2000','WidgetCo',0.4, 2], ['Feb 2000','GadgetCo',0.5, 2.5], ['Feb 2000','SnazzyCo',0.1, 4],
], columns=['month','company','share','price'])
What looks like:
month company share price
0 Jan 2000 WidgetCo 0.5 2.0
1 Jan 2000 GadgetCo 0.3 3.0
2 Jan 2000 SnazzyCo 0.2 4.0
3 Feb 2000 WidgetCo 0.4 2.0
4 Feb 2000 GadgetCo 0.5 2.5
5 Feb 2000 SnazzyCo 0.1 4.0
I can expand this table like this:
pd.pivot_table(ts,index='month', columns='company')
Which gets me:
share price
company GadgetCo SnazzyCo WidgetCo GadgetCo SnazzyCo WidgetCo
month
Feb 2000 0.5 0.1 0.4 2.5 4 2
Jan 2000 0.3 0.2 0.5 3.0 4 2
This is what I want, except that I need to collapse MultiIndex
to company
be used as a prefix for share
and price
like this:
WidgetCo_share WidgetCo_price GadgetCo_share GadgetCo_price ...
month
Jan 2000 0.5 2 0.3 3.0
Feb 2000 0.4 2 0.5 2.5
I came up with this function to do it, but it looks like a bad solution:
def pivot_table_to_flat(df, column, index):
res = df.set_index(index)
cols = res.drop(column, axis=1).columns.values
resulting_cols = []
for prefix in res[column].unique():
for col in cols:
new_col_name = prefix + '_' + col
res[new_col_name] = res[res[column] == prefix][col]
resulting_cols.append(new_col_name)
return res[resulting_cols]
pivot_table_to_flat(ts, index='month', column='company')
What is the best way to reach the point leading to prefixed columns rather than to MultiIndex
?
source to share
I understood that. Using the data in MultiIndex
makes a pretty clean solution:
def flatten_multi_index(df):
mi = df.columns
suffixes, prefixes = mi.levels
col_names = [prefixes[i_p] + '_' + suffixes[i_s] for (i_s, i_p) in zip(*mi.labels)]
df.columns = col_names
return df
flatten_multi_index(pd.pivot_table(ts,index='month', columns='company'))
The version above only handles 2D MultiIndex
, but it can be generalized if needed.
source to share
Update (as of early 2017 and pandas 0.19.2). You can use .values
on MultiIndex
. So this snippet should iron out MultiIndex
for those who need it. The snippet is too smart, but not smart enough: it can handle row or column index names from the DataFrame, but it will blow up if the result is getattr(df,way)
not nested (i.e. A MultiIndex
).
def flatten_multi(df, way='index'): # or way='columns'
assert way in {'index', 'columns'}, "I'm sorry Dave."
mi = getattr(df, way)
flat_names = ["_".join(s) for s in mi.values]
setattr(df, way, flat_names)
return df
source to share
It seems even easier:
df.columns = [' '.join(col).strip() for col in df.columns.values]
Required df
with a multiindex column and aligns the column labels while df stays in place.
(ref: @ andy-haden Python Pandas - How to flatten a hierarchical index on columns )
source to share