How to conditionally select the row / lines in each group?

Sample data:

tmp_dt <-
    data.table(grp = rep(c(1,2), each = 5), a = 1:10)

# > tmp_dt
#    grp  a
# 1:   1  1
# 2:   1  2
# 3:   1  3
# 4:   1  4
# 5:   1  5
# 6:   2  6
# 7:   2  7
# 8:   2  8
# 9:   2  9
# 10:  2 10

      

I know that I can get a subset of rows for each group using .SD

:

tmp_dt[, .SD[c(2,3)], by = grp]
# grp a
# 1:   1 2
# 2:   1 3
# 3:   2 7
# 4:   2 8

      

What I can't seem to get to work is the conditional subset of strings on grp

with data.table

. For example, I would like to get the equivalent of the following code dplyr

:

tmp_dt %>%
    group_by(grp) %>%
    filter(if_else(grp == 1, row_number() == 3, row_number() == 2)) %>%
    ungroup

# A tibble: 2 × 2
#     grp     a
#     <dbl> <int>
# 1     1     3
# 2     2     7

      

+3


source to share


2 answers


As data.table

you can do something like this:

tmp_dt[tmp_dt[, .I[if(grp == 1) 3 else 2], grp]$V1]

#   grp a
#1:   1 3
#2:   2 7

      



Note that the group variable in data.table

is a vector of length 1 (unlike other variables), so you can avoid using ifelse

which is less efficient than if / else:

tmp_dt[, length(grp), grp]

#   grp V1
#1:   1  1
#2:   2  1

      

+4


source


For your example, the approach if else

is probably the way to go.

If you want to expand it a bit, you can use "look-up" data.table

to indicate which line is being used



grp_dt <- data.table(grp = c(1,2),
                     row = c(3,2))

tmp_dt[ grp_dt, on = "grp", a[i.row], by = .EACHI]
# tmp_dt[ grp_dt, on = "grp", .(a = a[i.row]), by = .EACHI] ## to keep column name

#    grp V1
# 1:   1  3
# 2:   2  7

      

+4


source







All Articles