How can I split a dataframe into two columns and count the number of rows based on a group more efficiently
I have a data.frame with over 120,000 rows, it looks like
> head(mydf)
ID MONTH.YEAR VALUE
1 110 JAN. 2012 1000
2 111 JAN. 2012 1000
3 121 FEB. 2012 3000
4 131 FEB. 2012 3000
5 141 MAR. 2012 5000
6 142 MAR. 2012 4000
and I want to split the data.frame into a column MONTH.YEAR
and VALUE
and count the rows of each group, my wait response should look like this:
MONTH.YEAR VALUE count
JAN. 2012 1000 2
FEB. 2012 3000 2
MAR. 2012 5000 1
MAR. 2012 4000 1
I tried to break it down and use to sapply
count the number of each group and this is my code
sp <- split(mydf, list(mydf$MONTH.YEAR, mydf$VALUE), drop=TRUE);
result <- data.frame(yearandvalue = names(sapply(sp, nrow)), count = sapply(sp, nrow))
but I think the process is very slow. Is there a better way to influence this? many thanks.
+3
source to share