Finding the most common combinations
I have a dataframe with 2 columns, id and brand:
X1 X2
1234 A89
1234 A87
1234 A87
1234 A32
1234 A27
1234 A27
1235 A12
1235 A14
1235 A14
1236 A32
1236 A32
1236 A27
1236 A12
1236 A12
1236 A14
1236 A89
1236 A87
1237 A99
1237 A98
I want to find combinations of the 3 best brands that are most common in terms of id number:
A89, A87 A32, A27 A12, A14
I tried: library (dplyr)
df %>%
group_by(X1,X2) %>%
mutate(n = n()) %>%
group_by(X1) %>%
slice(which.max(n)) %>%
select(-n)
But it doesn't work right. I would appreciate any thoughts or ideas!
source to share
Here you can do it in base R. Divide X2
by X1
and then get a combination of two values ββfor each subgroup. We then grab the three most common ones.
with(data.frame(table(unlist(lapply(split(df$X2, df$X1), function(x)
combn(unique(x), min(2, length(x)), paste, collapse = "-"))))),
as.character(Var1[head(order(Freq, decreasing = TRUE), 3)]))
#[1] "A12-A14" "A32-A27" "A89-A87"
DATA
df = structure(list(X1 = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L,
1235L, 1235L, 1235L, 1236L, 1236L, 1236L, 1236L, 1236L, 1236L,
1236L, 1236L, 1237L, 1237L), X2 = c("A89", "A87", "A87", "A32",
"A27", "A27", "A12", "A14", "A14", "A32", "A32", "A27", "A12",
"A12", "A14", "A89", "A87", "A99", "A98")), .Names = c("X1",
"X2"), class = "data.frame", row.names = c(NA, -19L))
source to share
I would not use which.max
but stuck with the standarddplyr
Library (dplyr)
df %>%
group_by(X1,X2) %>%
mutate(n = n()) %>%
group_by(X2) %>% # or X1? Unclear which is your sorting variable
top_n(3, X2) %>% # Returns 3 rows for each, can be changed / also could be X1
select(-n)
Note: top_n
will contain more 3
lines if there are links
source to share
Here is another basic R method using R tapply
for grouping and combn
with paste
for pairing.
names(tail(sort(table(unlist(tapply(df$X2, df$X1,
FUN=function(x) combn(unique(x), 2, paste, collapse="-"))))),
3))
[1] "A12-A14" "A32-A27" "A89-A87"
combn(unique(x), 2, paste, collapse="-")
unique pairs and insert them together. tapply
does this over the group and returns a list. unlist
turns it into a vector, and table
calculates the frequencies. These frequencies are sorted from smallest to largest, and we grab the last 3 with tail
. names
extracts a pair, not a score from table
.
source to share
Alternative solution. First we generate all 2 combinations from X2 by group (X1) and then we use dplyr to aggregate and subset the top 3.
combinations = as.data.frame(do.call(rbind,lapply(split(df,df$X1), function(x)
{t(combn(unique(x$X2),2))})))
combinations %>%
group_by(V1,V2) %>%
summarize(n=n()) %>%
arrange(-n) %>%
.[1:3,]
Output:
# A tibble: 3 x 3
# Groups: V1 [3]
V1 V2 n
<fctr> <fctr> <int>
1 A12 A14 2
2 A32 A27 2
3 A89 A87 2
source to share