Match pattern inside any data item using data table, not plyr

I have a very large dataset and haven't used data.table before. I find the syntax a little difficult to follow. My main question is, how can I reproduce the "apply" function for the data table?

My details are as follows

dat1 <- structure(list(id = c(1L, 1L, 2L, 3L), diag1 = structure(1:4, .Label = c("I20.1","I21.3", "I48", "I60.8"), class = "factor"), diag2 = structure(c(3L,2L, 1L, 1L), .Label = c("", "I50", "I60.9"), class = "factor"), diag3 = structure(c(1L, 2L, 1L, 1L), .Label = c("", "I38.1"), class = "factor")), .Names = c("id", "diag1", "diag2", "diag3"), row.names = c(NA, -4L), class = "data.frame")

      

I want to add a variable for all records that have a diagnostic code either inside diag1, diag2, or diag 3 columns from I20, I21, or I60. Using apply and regex I did the following.

code.list <- c("I20","I21","I60")    
dat1$index <- apply(dat1[2:4],1, function(i) any(grep(paste(code.list,
collapse="|"), i)))

      

I am getting the final dataset I want as shown below

structure(list(id = c(1L, 1L, 2L, 3L), diag1 = structure(1:4, .Label = c("I20.1","I21.3", "I48", "I60.8"), class = "factor"), diag2 = structure(c(3L,2L, 1L, 1L), .Label = c("", "I50", "I60.9"), class = "factor"),diag3 = structure(c(1L, 2L, 1L, 1L), .Label = c("", "I38.1"), class = "factor"), index = c(TRUE, TRUE, FALSE, TRUE)), .Names = c("id","diag1", "diag2", "diag3", "index"), row.names = c(NA, -4L), class = "data.frame")

      

However, it will take too long using plyr. I was hoping to get the data table syntax. Can anyone help?

Thank you in advance

A

+3


source to share


1 answer


We can do this with data.table



library(data.table)
setDT(dat1)[, index := Reduce(`|`, lapply(.SD, grepl,
         pattern = paste(code.list, collapse="|"))), .SDcols = 2:4]
dat1
#    id diag1 diag2 diag3 index
#1:  1 I20.1 I60.9        TRUE
#2:  1 I21.3   I50 I38.1  TRUE
#3:  2   I48             FALSE
#4:  3 I60.8              TRUE

      

0


source







All Articles