R data.table multi column recode / sub-assign

Let DT be a data table.

DT<-data.table(V1=sample(10),
               V2=sample(10),
               ...
               V9=sample(10),)

      

Is there a better / simpler way to do multi-column transcoding / subplots like this:

DT[V1==1 | V1==7,V1:=NA]
DT[V2==1 | V2==7,V2:=NA]
DT[V3==1 | V3==7,V3:=NA]
DT[V4==1 | V4==7,V4:=NA]
DT[V5==1 | V5==7,V5:=NA]
DT[V6==1 | V6==7,V6:=NA]
DT[V7==1 | V7==7,V7:=NA]
DT[V8==1 | V8==7,V8:=NA]
DT[V9==1 | V9==7,V9:=NA]

      

Variable names are completely arbitrary and do not have to be numbers. Many columns (Vx: Vx) and one recode pattern for all (NAME == 1 | NAME == 7, NAME: = something).

And further, as a multi-column subassign NA to something else. For example, in data.frame format:

data[,columns][is.na(data[,columns])] <- a_value

      

+3


source to share


1 answer


You can use set

to replace values ​​in multiple columns. Based on ?set

, it is fast as the overhead is [.data.table

eliminated. We use a loop for

to loop through the columns and replace the values ​​that were indexed by 'i' and 'j' with 'NA'

 for(j in seq_along(DT)) {
      set(DT, i=which(DT[[j]] %in% c(1,7)), j=j, value=NA)
  }

      

EDIT: Included @David Arenburg's comments.



data

set.seed(24)
DT<-data.table(V1=sample(10), V2= sample(10), V3= sample(10))

      

+5


source







All Articles