How to find the top N descending values ​​in a group in dplyr

I have the following dataframe in R

  Serivce     Codes
   ABS         RT
   ABS         RT
   ABS         TY
   ABS         DR
   ABS         DR
   ABS         DR
   ABS         DR
   DEF         RT
   DEF         RT
   DEF         TY
   DEF         DR
   DEF         DR
   DEF         DR
   DEF         DR
   DEF         TY
   DEF         SE
   DEF         SE

      

What I want is to count the service code in descending order.

  Serivce     Codes    Count
   ABS         DR        4
   ABS         RT        2 
   ABS         TY        1
   DEF         DR        4
   DEF         RT        2
   DEF         TY        2  

      

I am doing the following in r

df%>% 
group_by(Service,Codes) %>% 
summarise(Count = n()) %>%
top_n(n=3,wt = Count) %>% 
arrange(desc(Count)) %>% 
as.data.frame()   

      

But that doesn't give me what is intended.

+4


source to share


3 answers


We can try with count/arrange/slice

df1 %>% 
   count(Service, Codes) %>%
   arrange(desc(n)) %>% 
   group_by(Service) %>% 
   slice(seq_len(3))
# A tibble: 6 x 3
# Groups:   Service [2]
#  Service Codes     n
#    <chr> <chr> <int>
#1     ABS    DR     4
#2     ABS    RT     2
#3     ABS    TY     1
#4     DEF    DR     4
#5     DEF    RT     2
#6     DEF    SE     2

      




In the OP code, we arrange

also need "Service". As @Marius said in the comments, top_n

will contain more lines if there are links. One option is to do the second grouping with "Tools" and slice

(as shown above) or after grouping, we canfilter

df1 %>% 
  group_by(Service,Codes) %>%
  summarise(Count = n()) %>%
  top_n(n=3,wt = Count)  %>%
  arrange(Service, desc(Count)) %>%
  group_by(Service) %>%
  filter(row_number() <=3)

      

+5


source


df%>% count (Service, Codes)%>% mutate (rank = dens_rank (desc (n)))%>% filter (rank & lt; 5)

the number of rows returned for top_n () is exactly the same as row_number ()



n - service group, codes are counted as

+1


source


In an R base, you can do this in two lines.

# get data.frame of counts by service-code pairs
mydf <- data.frame(table(dat))

# get top 3 by service
do.call(rbind, lapply(split(mydf, mydf$Serivce), function(x) x[order(-x$Freq)[1:3],]))

      

This returns

      Serivce Codes Freq
ABS.1     ABS    DR    4
ABS.3     ABS    RT    2
ABS.7     ABS    TY    1
DEF.2     DEF    DR    4
DEF.4     DEF    RT    2
DEF.6     DEF    SE    2

      

In the first line, use table

to get the counters and then convert to data.frame. The second line splits by service, order negative values order

and pull out the first three items. Combine results with do.call

.

0


source







All Articles