How to conditionally select the row / lines in each group?
Sample data:
tmp_dt <-
data.table(grp = rep(c(1,2), each = 5), a = 1:10)
# > tmp_dt
# grp a
# 1: 1 1
# 2: 1 2
# 3: 1 3
# 4: 1 4
# 5: 1 5
# 6: 2 6
# 7: 2 7
# 8: 2 8
# 9: 2 9
# 10: 2 10
I know that I can get a subset of rows for each group using .SD
:
tmp_dt[, .SD[c(2,3)], by = grp]
# grp a
# 1: 1 2
# 2: 1 3
# 3: 2 7
# 4: 2 8
What I can't seem to get to work is the conditional subset of strings on grp
with data.table
. For example, I would like to get the equivalent of the following code dplyr
:
tmp_dt %>%
group_by(grp) %>%
filter(if_else(grp == 1, row_number() == 3, row_number() == 2)) %>%
ungroup
# A tibble: 2 × 2
# grp a
# <dbl> <int>
# 1 1 3
# 2 2 7
source to share
As data.table
you can do something like this:
tmp_dt[tmp_dt[, .I[if(grp == 1) 3 else 2], grp]$V1]
# grp a
#1: 1 3
#2: 2 7
Note that the group variable in data.table
is a vector of length 1 (unlike other variables), so you can avoid using ifelse
which is less efficient than if / else:
tmp_dt[, length(grp), grp]
# grp V1
#1: 1 1
#2: 2 1
source to share
For your example, the approach if else
is probably the way to go.
If you want to expand it a bit, you can use "look-up" data.table
to indicate which line is being used
grp_dt <- data.table(grp = c(1,2),
row = c(3,2))
tmp_dt[ grp_dt, on = "grp", a[i.row], by = .EACHI]
# tmp_dt[ grp_dt, on = "grp", .(a = a[i.row]), by = .EACHI] ## to keep column name
# grp V1
# 1: 1 3
# 2: 2 7
source to share