Finding the most common combinations

I have a dataframe with 2 columns, id and brand:

X1     X2
1234   A89
1234   A87
1234   A87
1234   A32
1234   A27
1234   A27
1235   A12
1235   A14
1235   A14
1236   A32
1236   A32
1236   A27
1236   A12
1236   A12
1236   A14
1236   A89
1236   A87
1237   A99
1237   A98

      

I want to find combinations of the 3 best brands that are most common in terms of id number:

A89, A87
A32, A27
A12, A14

      

I tried: library (dplyr)

 df %>% 
  group_by(X1,X2) %>%
  mutate(n = n()) %>%
  group_by(X1) %>%
  slice(which.max(n)) %>%
  select(-n)

      

But it doesn't work right. I would appreciate any thoughts or ideas!

+3


source to share


4 answers


Here you can do it in base R. Divide X2

by X1

and then get a combination of two values ​​for each subgroup. We then grab the three most common ones.

with(data.frame(table(unlist(lapply(split(df$X2, df$X1), function(x)
    combn(unique(x), min(2, length(x)), paste, collapse = "-"))))),
    as.character(Var1[head(order(Freq, decreasing = TRUE), 3)]))
#[1] "A12-A14" "A32-A27" "A89-A87"

      



DATA

df = structure(list(X1 = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 
1235L, 1235L, 1235L, 1236L, 1236L, 1236L, 1236L, 1236L, 1236L, 
1236L, 1236L, 1237L, 1237L), X2 = c("A89", "A87", "A87", "A32", 
"A27", "A27", "A12", "A14", "A14", "A32", "A32", "A27", "A12", 
"A12", "A14", "A89", "A87", "A99", "A98")), .Names = c("X1", 
"X2"), class = "data.frame", row.names = c(NA, -19L))

      

+2


source


I would not use which.max

but stuck with the standarddplyr

Library (dplyr)



df %>% 
  group_by(X1,X2) %>%
  mutate(n = n()) %>%
  group_by(X2) %>% # or X1? Unclear which is your sorting variable
  top_n(3, X2) %>% # Returns 3 rows for each, can be changed / also could be X1
  select(-n)

      

Note: top_n

will contain more 3

lines if there are links

+1


source


Here is another basic R method using R tapply

for grouping and combn

with paste

for pairing.

names(tail(sort(table(unlist(tapply(df$X2, df$X1,
                                FUN=function(x) combn(unique(x), 2, paste, collapse="-"))))),
           3))
[1] "A12-A14" "A32-A27" "A89-A87"

      

combn(unique(x), 2, paste, collapse="-")

unique pairs and insert them together. tapply

does this over the group and returns a list. unlist

turns it into a vector, and table

calculates the frequencies. These frequencies are sorted from smallest to largest, and we grab the last 3 with tail

. names

extracts a pair, not a score from table

.

+1


source


Alternative solution. First we generate all 2 combinations from X2 by group (X1) and then we use dplyr to aggregate and subset the top 3.

combinations = as.data.frame(do.call(rbind,lapply(split(df,df$X1), function(x) 
    {t(combn(unique(x$X2),2))})))

combinations %>% 
group_by(V1,V2) %>% 
summarize(n=n()) %>% 
arrange(-n) %>% 
.[1:3,]

      

Output:

# A tibble: 3 x 3
# Groups:   V1 [3]
      V1     V2     n
  <fctr> <fctr> <int>
1    A12    A14     2
2    A32    A27     2
3    A89    A87     2

      

0


source







All Articles