String operations on selected columns based on substring in data.table
I would like to apply a function to selected columns that match two different substrings. I found this thread related to my question , but I couldn't get an answer from there.
Here is an example of reproducibility from my failed attempt. For this example, I want to perform a row- v
wise operation where I add the values ββfrom all columns starting with the row and subtract from the mean in all columns starting at f
.
update: the proposed solution is to (a) use the operator :=
to make the most of the data.table
high performance and (2) be flexible for another operation, not mean
and sum
that I am used here just for simplicity
library(data.table)
# generate data
dt <- data.table(id= letters[1:5],
v1= 1:5,
v2= 1:5,
f1= 11:15,
f2= 11:15)
dt
#> id v1 v2 f1 f2
#> 1: a 1 1 11 11
#> 2: b 2 2 12 12
#> 3: c 3 3 13 13
#> 4: d 4 4 14 14
#> 5: e 5 5 15 15
# what I've tried
dt[, Y := sum( .SDcols=names(dt) %like% "v" ) - mean( .SDcols=names(dt) %like% "f" ) by = id]
source to share
We are melt
a dataset in "long" format, using an argument measure
, we get the difference between sum
of 'v' and mean
of 'f', grouped by 'id', join on
the "id" column with the original dataset and assign ( :=
) "V1" as a variable "Y"
dt[melt(dt, measure = patterns("^v", "^f"), value.name = c("v", "f"))[
, sum(v) - mean(f), id], Y :=V1, on = .(id)]
dt
# id v1 v2 f1 f2 Y
#1: a 1 1 11 11 -9
#2: b 2 2 12 12 -8
#3: c 3 3 13 13 -7
#4: d 4 4 14 14 -6
#5: e 5 5 15 15 -5
Or another option with Reduce
after creating indexes or columns "v" and "f"
nmv <- which(startsWith(names(dt), "v"))
nmf <- which(startsWith(names(dt), "f"))
l1 <- length(nmv)
dt[, Y := Reduce(`+`, .SD[, nmv, with = FALSE])- (Reduce(`+`, .SD[, nmf, with = FALSE])/l1)]
source to share