Tough loop in R?

I have been struggling for days to solve this problem in R (I am a former SAS user).

Tuning / Research - Observational data. Crohn's disease patients. Data were collected annually during 2002-2013. - Patients can be enrolled in any year and visits may be irregular on an annual basis. - I know the exact day of death for each patient. VARIABLE: DEATH_YEAR - I know the exact day of the relapse (endpoint of interest). Variable: RELAPSE_YEAR

I am interested in the incidence of relapse and I need to calculate the number of relapses each year divided by the number of people living that year. Now the problem is that people come from inclusion irregularly, but I know if they are really alive this year and if they have relapsed.

I could solve this if I could create 12 new variables for each patient. Each new variable must be a calendar year, and this variable must be set to "1" if the patient is alive this year and has not yet experienced the event.

So the problem is that I need to create "year-variables" that are set to "1" for each year when turned on, and after that, given that the person did not die or survived the event.

Example: Patient X was included in 2005 and died in 2009. For him I need his following variables: "2005", "2006", "2007", "2008" and "2009" - "1". Patient Y was included in 2005 and experienced event 2007. For that I need the following variables: "2005", "2006", 2007 "set to" 1 ". (Yes, the year of the event / death should still be set to" 1" ).

This is what my dataset looks like:

data <- read.table(header = TRUE, text = "
patient     visit   first_visit relapse_year     death_year 
1          2003 2003    .   2010    
1          2004 2003    .   2010    
1          2009 2003    .   2010    
2          2002 2002    2006    .   
2          2006 2002    2006    .   
2          2006 2002    2006    .   
2          2008 2002    2006    .   
2          2012 2002    2006    .   
3          2004 2004    .   .   
3          2008 2004    .   .   
3          2008 2004    .   .
")

      

Here is the DESIRED set

desired_data <- read.table(header = TRUE, text = "
patient     visit     first_visit   relapse_year    death_year YEAR2002     YEAR2003    YEAR2004    YEAR2005    YEAR2006    YEAR2007    YEAR2008    YEAR2009    YEAR2010    YEAR2011    YEAR2012
1          2003 2003    .   2010    .   1   1   1   1   1   1   1   1   .   .
1          2004 2003    .   2010    .   1   1   1   1   1   1   1   1   .   .
1          2009 2003    .   2010    .   1   1   1   1   1   1   1   1   .   .
2           2002    2002    2006    .   1   1   1   1   1   .   .   .   .   .   .
2          2006 2002    2006    .   1   1   1   1   1   .   .   .   .   .   .
2          2006 2002    2006    .   1   1   1   1   1   .   .   .   .   .   .
2          2008 2002    2006    .   1   1   1   1   1   .   .   .   .   .   .
2          2012 2002    2006    .   1   1   1   1   1   .   .   .   .   .   .
3          2004 2004    .   .   .   .   1   1   1   1   1   1   1   1   1
3          2008 2004    .   .   .   .   1   1   1   1   1   1   1   1   1
3          2008 2004    .   .   .   .   1   1   1   1   1   1   1   1   1
")

      

I would be extremely grateful for any advice on this matter! Thanks in advance!

+3


source to share


1 answer


It's a bit hacky, but it will work. First turn your data into a numeric dataframe to .

turn into NA

:

data0<-data.frame(lapply(data,function(x) as.numeric(as.character(x))))
head(data0)
#    patient visit first_visit relapse_year death_year
# 1        1  2003        2003           NA       2010
# 2        1  2004        2003           NA       2010
# 3        1  2009        2003           NA       2010
# 4        2  2002        2002         2006         NA
# 5        2  2006        2002         2006         NA
# 6        2  2006        2002         2006         NA

      

Then replace 2012 (or whatever was last year) with NA values.



data0[is.na(data0)]<-2012

      

Now you can use pmin

to determine how long until the patient dies / repeats / the experiment ends. The last thing to do is use arithmetic on column numbers to create a new dataset:

activeYears<-matrix(0,nrow(data0),11)
colnames(activeYears)<-2002:2012
startYear<-data0$first_visit[row(activeYears)]
endYear<-pmin(data0$relapse_year[row(activeYears)],data0$death_year[row(activeYears)])
colYear<-col(activeYears)+2001
activeYears[]<-startYear<=colYear & endYear>=colYear
activeYears
#      2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
# [1,]    0    1    1    1    1    1    1    1    1    0    0
# [2,]    0    1    1    1    1    1    1    1    1    0    0
# [3,]    0    1    1    1    1    1    1    1    1    0    0
# [4,]    1    1    1    1    1    0    0    0    0    0    0
# [5,]    1    1    1    1    1    0    0    0    0    0    0
# [6,]    1    1    1    1    1    0    0    0    0    0    0
# [7,]    1    1    1    1    1    0    0    0    0    0    0
# [8,]    1    1    1    1    1    0    0    0    0    0    0
# [9,]    0    0    1    1    1    1    1    1    1    1    1
#[10,]    0    0    1    1    1    1    1    1    1    1    1
#[11,]    0    0    1    1    1    1    1    1    1    1    1

      

+2


source







All Articles