How to find the top N descending values in a group in dplyr

Question

How to find the top N descending values in a group in dplyr

I have the following dataframe in R

  Serivce     Codes
   ABS         RT
   ABS         RT
   ABS         TY
   ABS         DR
   ABS         DR
   ABS         DR
   ABS         DR
   DEF         RT
   DEF         RT
   DEF         TY
   DEF         DR
   DEF         DR
   DEF         DR
   DEF         DR
   DEF         TY
   DEF         SE
   DEF         SE

What I want is to count the service code in descending order.

  Serivce     Codes    Count
   ABS         DR        4
   ABS         RT        2 
   ABS         TY        1
   DEF         DR        4
   DEF         RT        2
   DEF         TY        2

I am doing the following in r

df%>% 
group_by(Service,Codes) %>% 
summarise(Count = n()) %>%
top_n(n=3,wt = Count) %>% 
arrange(desc(Count)) %>% 
as.data.frame()

But that doesn't give me what is intended.

+4

r

Neil Jul 28 17 at 5:28 am

source to share

3 answers

df%>% count (Service, Codes)%>% mutate (rank = dens_rank (desc (n)))%>% filter (rank & lt; 5)

the number of rows returned for top_n () is exactly the same as row_number ()

n - service group, codes are counted as

+1

user6320548 08 oct. '19 at 8:35

source to share

In an R base, you can do this in two lines.

# get data.frame of counts by service-code pairs
mydf <- data.frame(table(dat))

# get top 3 by service
do.call(rbind, lapply(split(mydf, mydf$Serivce), function(x) x[order(-x$Freq)[1:3],]))

This returns

      Serivce Codes Freq
ABS.1     ABS    DR    4
ABS.3     ABS    RT    2
ABS.7     ABS    TY    1
DEF.2     DEF    DR    4
DEF.4     DEF    RT    2
DEF.6     DEF    SE    2

In the first line, use table

to get the counters and then convert to data.frame. The second line splits by service, order negative values order

and pull out the first three items. Combine results with do.call

.

0

lmo Jul 28 17 at 12:38

source to share

akrun · Accepted Answer · 2017-07-28T05:29:37+0000

We can try with count/arrange/slice

df1 %>% 
   count(Service, Codes) %>%
   arrange(desc(n)) %>% 
   group_by(Service) %>% 
   slice(seq_len(3))
# A tibble: 6 x 3
# Groups:   Service [2]
#  Service Codes     n
#    <chr> <chr> <int>
#1     ABS    DR     4
#2     ABS    RT     2
#3     ABS    TY     1
#4     DEF    DR     4
#5     DEF    RT     2
#6     DEF    SE     2

In the OP code, we arrange

also need "Service". As @Marius said in the comments, top_n

will contain more lines if there are links. One option is to do the second grouping with "Tools" and slice

(as shown above) or after grouping, we canfilter

df1 %>% 
  group_by(Service,Codes) %>%
  summarise(Count = n()) %>%
  top_n(n=3,wt = Count)  %>%
  arrange(Service, desc(Count)) %>%
  group_by(Service) %>%
  filter(row_number() <=3)

How to find the top N descending values ​​in a group in dplyr

the number of rows returned for top_n () is exactly the same as row_number ()

n - service group, codes are counted as

More articles:

How to find the top N descending values in a group in dplyr