An efficient way to double aggregation for a data frame

Question

An efficient way to double aggregation for a data frame

I have questions regarding aggregating a dataframe in half and involves rebuilding the table.

I have a table that contains two columns: name and category. Category - factorial variable, contains 10 levels, from "0" to "9". So the data frame looks like this:

name   category
a        0
a        1
a        1
a        4
a        9
b        2
b        2
b        2
b        3
b        7
b        8
c        0
c        0
c        0

The result I want to fill is as follows:

name category.0  category.1  category.2 category.3 category.4 ..... category.9
a        1           2            0         0           1               1
b        0           0            3         1           0               0            
c        3           0            0         0           0               0

it counts how many '0', '1', ..., '9' for each unique name.

What I have done to generate the result is to use a simple aggregate function

new_df <- aggregate(category ~ name,df, FUN=summary)

and then reverse the second column new_df to get the result.

However, it is too slow. I would like to know if there is a more efficient way to do this.

+3

r aggregate-functions dataframe

zxwjames May 21 '15 at 17:48

source to share

1 answer

Colonel Beauvel · Accepted Answer · 2015-05-21T17:59:07+0000

You can use dcast

from package reshape2

:

library(reshape2)

x = dcast(df, name~category)
setNames(x, c(names(x)[1], paste0('category',names(x)[-1])))

#  name category0 category1 category2 category3 category4 category7 category8 category9
#1    a         1         2         0         0         1         0         0         1
#2    b         0         0         3         1         0         1         1         0
#3    c         3         0         0         0         0         0         0         0

An efficient way to double aggregation for a data frame

More articles: