How to use app function over character vectors inside data.table
I'm trying to get an idea of ββthe availability of my data, which might look like this:
DT <- data.table(id=rep(c("a","b"),each=20),time=rep(1991:2010,2),
x=rbeta(40,shape1=1,shape2=2),
y=rnorm(40))
#I have some NA (no gaps):
DT[id=="a"&time<2000,x:=NA]
DT[id=="b"&time>2005,y:=NA]
but much more. Ideally, I would like to see a table like this:
a b
x 2000-2010 1991-2010
y 1991-2010 1991-2005
so as not to miss the minimum period without gaps. I can get this for one variable:
DT[,availability_x:=paste0(
as.character(min(ifelse(!is.na(x),time,NA),na.rm=T)),
"-",
as.character(max(ifelse(!is.na(x),time,NA),na.rm=T))),
by=id]
But actually I want to do this for many variables. However, all my attempts to do this will not work because I am having a hard time transferring a vector of columns to a data table. I assume it goes in the direction of this or, but my attempts to adapt these solutions to a column vector have failed.
For example, the apply function does not evaluate the elements of a character vector:
cols <- c("x","y")
availabilityfunction <- function(i){
DT[,paste0("avail_",i):=paste0(
as.character(min(ifelse(!is.na(i),time,NA),na.rm=T)),
"-",
as.character(max(ifelse(!is.na(i),time,NA),na.rm=T))),
by=id]}
lapply(cols,availabilityfunction)
source to share
We can loop ( lapply
) through the columns of interest specified in .SDcols
after grouping by 'id', create a boolean index of non-NA elements ( !is.na
), find the numeric index (), get range
(i.e. min
and max
), use for a subset of the column "time" and the paste
elements of time together.
DT[, lapply(.SD, function(x) paste(time[range(which(!is.na(x)))],
collapse="-")), by = id, .SDcols = x:y]
# id x y
#1: a 2000-2010 1991-2010
#2: b 1991-2010 1991-2005
source to share