Selecting top N values within a group in a column using R
I need to select the top two values for each value of the [yearmonth] group from the next data frame in R. I have already sorted the data by account and yearmonth.How can you achieve this in the following data?
yearmonth name count
1 201310 Dovas 5
2 201310 Indulgd 2
3 201310 Justina 1
4 201310 Jolita 1
5 201311 Shahrukh Sheikh 1
6 201311 Dovas 29
7 201311 Justina 13
8 201311 Lina 8
9 201312 sUPERED 7
10 201312 John Hansen 7
11 201312 Lina D. 6
12 201312 joanna1st 5
+3
source to share
3 answers
Or using data.table
( mydf
from @jazzurro's post). Some parameters
library(data.table)
setDT(mydf)[order(yearmonth,-count), .SD[1:2], by=yearmonth]
or
setDT(mydf)[mydf[order(yearmonth, -count), .I[1:2], by=yearmonth]$V1,]
or
setorder(setkey(setDT(mydf), yearmonth), yearmonth, -count)[
,.SD[1:2], by=yearmonth]
# yearmonth name count
#1: 201310 Dovas 5
#2: 201310 Indulgd 2
#3: 201311 Dovas 29
#4: 201311 Justina 13
#5: 201312 sUPERED 7
#6: 201312 John Hansen 7
+7
source to share
Here's one way:
library(dplyr)
mydf %>%
group_by(yearmonth) %>%
arrange(desc(count)) %>%
slice(1:2)
# yearmonth name count
#1 201310 Dovas 5
#2 201310 Indulgd 2
#3 201311 Dovas 29
#4 201311 Justina 13
#5 201312 sUPERED 7
#6 201312 John Hansen 7
DATA
mydf <- data.frame(yearmonth = rep(c("201310", "201311", "201312"), each = 4),
name = c("Dovas", "Indulgd", "Justina", "Jolita", "Shahrukh Sheikh",
"Dovas", "Justina", "Lina", "sUPERED", "John Hansen",
"Lina D.", "joanna1st"),
count = c(5,2,1,1,1,29,13,8,7,7,6,5),
stringsAsFactors = FALSE)
+4
source to share