Unique values for 1st group, then 1st and 2nd, etc.

Question

Unique values for 1st group, then 1st and 2nd, etc.

I have a dataframe with 5 different groups:

   id group
1  L1     1
2  L2     1
3  L1     2
4  L3     2
5  L4     2
6  L3     3
7  L5     3
8  L6     3
9  L1     4
10 L4     4
11 L2     5

I would like to know if it is possible to get unique id

from 1st group, 1st and 2nd, 1st, 2nd and 3rd, etc. no loop. I'm looking for a way with a package dplyr

or data.table

.

Expected results:

    group      id
1   1          c("L1", "L2")
2   1,2        c("L1", "L2", "L3", "L4")
3   1,2,3      c("L1", "L2", "L3", "L4", "L5")
4   1,2,3,4    c("L1", "L2", "L3", "L4", "L5")
5   1,2,3,4,5  c("L1", "L2", "L3", "L4", "L5")

Data:

structure(list(id = c("L1", "L2", "L1", "L3", "L4", "L3", "L5", 
"L6", "L1", "L4", "L2"), group = structure(c(1L, 1L, 2L, 2L, 
2L, 3L, 3L, 3L, 4L, 4L, 5L), .Label = c("1", "2", "3", "4", "5"
), class = "factor")), .Names = c("id", "group"), row.names = c(NA, 
-11L), class = "data.frame")

+3

r dataframe

Omlere 09 May '17 at 9:30

source to share

3 answers

In the same vein as @Cath's answer, but using Reduce(..., accumulate = TRUE)

to create an expanding group window. Then, let's go to a set of groups using lapply

to get a unique ID for each window:

grp <- Reduce(c, unique(d$group), accumulate = TRUE)

lapply(grp, function(x) unique(d$id[d$group %in% x]))
# [[1]]
# [1] "L1" "L2"
# 
# [[2]]
# [1] "L1" "L2" "L3" "L4"
# 
# [[3]]
# [1] "L1" "L2" "L3" "L4" "L5" "L6"
# 
# [[4]]
# [1] "L1" "L2" "L3" "L4" "L5" "L6"
# 
# [[5]]
# [1] "L1" "L2" "L3" "L4" "L5" "L6"

For notation and prefixing, please refer to @Cath's good answer.

+6

Henrik May 09 '17 at 10:07

source to share

Another method is to use split

and Reduce

to concatenate groups in union

cumulative = TRUE:

Reduce(union, split(df$id, df$group), accumulate=TRUE)
[[1]]
[1] "L1" "L2"

[[2]]
[1] "L1" "L2" "L3" "L4"

[[3]]
[1] "L1" "L2" "L3" "L4" "L5" "L6"

[[4]]
[1] "L1" "L2" "L3" "L4" "L5" "L6"

[[5]]
[1] "L1" "L2" "L3" "L4" "L5" "L6"

+4

lmo May 09 '17 at 12:15

source to share

Cath · Accepted Answer · 2017-05-09T09:40:56+0000

With an R base, you can:

# create the "growing" sets of groups
combi_groups <- lapply(seq_along(unique(df$group)), function(i) unique(df$group)[1:i])

# get the unique ID for each set of groups
uniq_ID <- setNames(lapply(combi_groups, function(x) unique(df$id[df$group %in% x])), 
                    sapply(combi_groups, paste, collapse=","))

# $`1`
# [1] "L1" "L2"

# $`1,2`
# [1] "L1" "L2" "L3" "L4"

# $`1,2,3`
# [1] "L1" "L2" "L3" "L4" "L5" "L6"

# $`1,2,3,4`
# [1] "L1" "L2" "L3" "L4" "L5" "L6"

# $`1,2,3,4,5`
# [1] "L1" "L2" "L3" "L4" "L5" "L6"

If you want to format like in your expected output:

data.frame(group=sapply(combi_groups, paste, collapse=", "), id=sapply(uniq_ID, function(x) paste0("c(", paste0("\"", x, "\"", collapse=", "), ")")))
#          group                                    id
#1             1                         c("L1", "L2")
#2          1, 2             c("L1", "L2", "L3", "L4")
#3       1, 2, 3 c("L1", "L2", "L3", "L4", "L5", "L6")
#4    1, 2, 3, 4 c("L1", "L2", "L3", "L4", "L5", "L6")
#5 1, 2, 3, 4, 5 c("L1", "L2", "L3", "L4", "L5", "L6")

Another formatting option:

data.frame(group=rep(names(uniq_ID), sapply(uniq_ID, length)), id=unlist(uniq_ID))

Or if you want to have uniq_ID

in a column:

library(data.table)
data.table(group=sapply(combi_groups, paste, collapse=", "), id=uniq_ID)
#           group                id
#1:             1             L1,L2
#2:          1, 2       L1,L2,L3,L4
#3:       1, 2, 3 L1,L2,L3,L4,L5,L6
#4:    1, 2, 3, 4 L1,L2,L3,L4,L5,L6
#5: 1, 2, 3, 4, 5 L1,L2,L3,L4,L5,L6

data.table(group=sapply(combi_groups, paste, collapse=", "), id=uniq_ID)[2, id]
[[1]]
[1] "L1" "L2" "L3" "L4"

Unique values ​​for 1st group, then 1st and 2nd, etc.

More articles:

Unique values for 1st group, then 1st and 2nd, etc.