How to expand data based on one column in python (transpose)?

I have a dataset as shown below and I need all different weights for each category in one row and count

Sample_data

  category  weights
1  aa        3.2
2  aa        2.2
3  aa        4.2
4  bb        3.5
5  bb        4.5
6  aa        0.5
7  cc        0.6
8  bb        7.5
9  cc        6.6
10 dd        2.2
11 aa        3.3
12 bb        4.4
13 cc        5.5
14 dd        6.6

      

And I need an account of each unique category and different weights of each category on the same row.

Expected output:

 category count  weight1  weight2  weight3  weight4  weight5   
1 aa      5      3.2      2.2      4.2      0.5      3.3
2 bb      4      3.5      4.5      7.5      4.4
3 cc      3      0.6      6.6      5.5
4 dd      2      2.2      6.6

      

I thought that

sampledata['category'].groupby(level = 0)   

      

will work, but it doesn't. Can anyone help me how to do this in python.

+3


source to share


3 answers


Perhaps I could shorten this, but the following works:

In [51]:

cat = df.groupby('category')['weights'].agg({'count':'count', 'weight_cat':lambda x: list(x)}).reset_index()
cat
Out[51]:
  category  count                 weight_cat
0       aa      5  [3.2, 2.2, 4.2, 0.5, 3.3]
1       bb      4       [3.5, 4.5, 7.5, 4.4]
2       cc      3            [0.6, 6.6, 5.5]
3       dd      2                 [2.2, 6.6]
In [52]:

cat = cat.join(cat['weight_cat'].apply(lambda x: pd.Series(x)))
cat
Out[52]:
  category  count                 weight_cat    0    1    2    3    4
0       aa      5  [3.2, 2.2, 4.2, 0.5, 3.3]  3.2  2.2  4.2  0.5  3.3
1       bb      4       [3.5, 4.5, 7.5, 4.4]  3.5  4.5  7.5  4.4  NaN
2       cc      3            [0.6, 6.6, 5.5]  0.6  6.6  5.5  NaN  NaN
3       dd      2                 [2.2, 6.6]  2.2  6.6  NaN  NaN  NaN
In [68]:

rename_cols = [col for col in cat if type(col) == int]
rename_weight_cols = ['weight'+str(col + 1) for col in rename_cols]
d = dict(zip(rename_cols, rename_weight_cols))
cat.rename(columns = d,inplace=True)
cat
Out[68]:
  category  count                 weight_cat  weight1  weight2  weight3  \
0       aa      5  [3.2, 2.2, 4.2, 0.5, 3.3]      3.2      2.2      4.2   
1       bb      4       [3.5, 4.5, 7.5, 4.4]      3.5      4.5      7.5   
2       cc      3            [0.6, 6.6, 5.5]      0.6      6.6      5.5   
3       dd      2                 [2.2, 6.6]      2.2      6.6      NaN   

   weight4  weight5  
0      0.5      3.3  
1      4.4      NaN  
2      NaN      NaN  
3      NaN      NaN 

      

So what's above is the first group in the category column and does the aggregation on the weight column, we create a count column, and then we translate all the values ​​for that group into a list and add that.



Then I call apply

this list to turn it into a series, this will automatically generate column names 0..4.

Then I create a dict to rename the columns with weights from 1 to 5 as desired.

+2


source


You can also use unstack

after resetting the index of each group:

dfw = df.groupby(['category'])['weights'].apply(lambda i: i.reset_index(drop=True)).unstack(level=1)

      

The size (here 'count'

) of each group df.groupby(['category']).size()

.



dfw.rename(columns=lambda x: 'weight'+ str(x+1), inplace=True) # string concatenation to give column labels
dfw.insert(0, 'count', df.groupby(['category']).size()) #insert count at position 0

      

This gives:

          count  weight1  weight2  weight3  weight4  weight5
category                                                    
aa            5      3.2      2.2      4.2      0.5      3.3
bb            4      3.5      4.5      7.5      4.4      NaN
cc            3      0.6      6.6      5.5      NaN      NaN
dd            2      2.2      6.6      NaN      NaN      NaN

      

+1


source


Considering that your sample data is represented as a list of dictionaries data

, where each dictionary has a key category

and weight

, then the following code will give you what you need:

trans_data = {}
for item in data:
    if item['category'] in trans_data:
        trans_data[item['category']]['count'] += 1
        trans_data[item['category']]['weights'].append(item['weight'])
    else:
        trans_data[item['category']] = {'count': 1, 'weights': [item['weight'],]}

      

It is assumed that the data structure data

is as follows:

data = [{'category': 'aa', 'weight': 3.2}, {'category': 'bb', 'weight': 2.2}, {'category': 'aa', 'weight': 1.1}]

      

0


source







All Articles