How can I split a dataframe into two columns and count the number of rows based on a group more efficiently

Question

How can I split a dataframe into two columns and count the number of rows based on a group more efficiently

I have a data.frame with over 120,000 rows, it looks like

> head(mydf)
ID MONTH.YEAR VALUE
1 110  JAN. 2012  1000
2 111  JAN. 2012  1000
3 121  FEB. 2012  3000
4 131  FEB. 2012  3000
5 141  MAR. 2012  5000
6 142  MAR. 2012  4000

and I want to split the data.frame into a column MONTH.YEAR

and VALUE

and count the rows of each group, my wait response should look like this:

MONTH.YEAR VALUE count
JAN. 2012  1000  2
FEB. 2012  3000  2
MAR. 2012  5000  1
MAR. 2012  4000  1

I tried to break it down and use to sapply

count the number of each group and this is my code

sp <- split(mydf, list(mydf$MONTH.YEAR, mydf$VALUE), drop=TRUE);
result <- data.frame(yearandvalue = names(sapply(sp, nrow)), count = sapply(sp, nrow))

but I think the process is very slow. Is there a better way to influence this? many thanks.

+3

r dataframe

Zihu guo May 18 '15 at 4:12

source to share

1 answer

akrun · Accepted Answer · 2015-05-18T04:14:03+0000

Try

aggregate(ID~., mydf, length)

or

library(dplyr)
 mydf %>%
    group_by(MONTH.YEAR, VALUE) %>%
    summarise(count=n())

or

library(data.table)
setDT(mydf)[, list(count=.N) , list(MONTH.YEAR, VALUE)]

How can I split a dataframe into two columns and count the number of rows based on a group more efficiently

More articles: