How to show an event happened between two dates in R

Question

How to show an event happened between two dates in R

My question seems simple, and I hope it is.

I have a date chart that has the date the disease was diagnosed, a dummy that indicates which drug was included in the patient (or exposed and unexposed group), the drug start and stop date, and the total stop date.

ID    Diag_date    Treatment    End.date    Drug.start    drug.end
1        NA           0         15/03/2002   01/01/2002    01/02/2002
1        NA           1         15/03/2002   01/02/2002    01/03/2002
1        NA           0         15/03/2002   01/03/2002    NA
2    01/04/2002       1         01/05/2002   01/01/2015    01/02/2002
2    01/04/2002       0         01/05/2002   01/02/2002    01/03/2002
2    01/04/2002       0         01/05/2002   01/03/2002    NA

As you can see, the date of the diagnosis does not change over time, but the start and end dates of the medication intake.

It is desirable that I answer two questions:

1.) How to transfer common End.date

to final drug.end

for everyone ID

?

2.) How do I create a binary column that shows if a diagnostic date exists between Drug.start

and drug.end

?

I want my latest data to look like this:

ID    Diag_date    Treatment    End.Date     Drug.start    Drug.end    Event
1        NA           0         15/03/2002   01/01/2002    01/02/2002   0
1        NA           1         15/03/2002   01/02/2002    01/03/2002   0
1        NA           0         15/03/2002   01/03/2002    15/03/2002   0 
2    01/04/2002       1         01/05/2002   01/01/2015    01/02/2002   0
2    01/04/2002       0         01/05/2002   01/02/2002    01/03/2002   0
2    01/04/2002       0         01/05/2002   01/03/2002    01/05/2002   1

Not everyone has a diagnosis date because not everyone in the sample had the disease. The code I wrote is the following:

 for (i in 1:nrow(df)) {
   if ((df$Diag_date[i] >= df$Drug.start[i]) && ( df$Diag_date[i] <= df$Drug.stop[i])) {
df$Event[i] <- 1
} else {
df$Event[i] <- 0
  }
}

the error i get when running this code:

missing value where TRUE/FALSE needed

Any help would be much appreciated.

+3

date r intervals

Lb93 07 jul. 15 at 7:48

source to share

3 answers

Its possible equivalent data.table

library(data.table)

# Converting to dates
Dates <- names(df)[c(2, 4:6)]
setDT(df)[, (Dates) := lapply(.SD, as.IDate, format = "%d/%m/%Y"), .SDcols = Dates] 

# First question
df[is.na(drug.end), drug.end := End.date] 

# Second question
df[Diag_date >= Drug.start & Diag_date <= drug.end, Event := 1L]

+4

David Arenburg 07 jul. 15 at 8:24

source to share

Akrun's answer is enough for the problem. By offering more direct code.

A <- read.table("clipboard", header = T)
Dates <- c("Diag_date", "End.date", "Drug.start", "drug.end")


A[,Dates] <- lapply(A[,Dates],function(x) as.Date(x, format = "%d/%m/%Y"))
A$drug.end[is.na(A$drug.end)] <- as.character(A$End.date[is.na(A$drug.end)])
A$Event <- as.numeric((A$Diag_date >= A$Drug.start & A$Diag_date<=A$drug.end) & !is.na(A$Diag_date))

+2

Omley 07 jul. 15 at 8:16

source to share

akrun · Accepted Answer · 2015-07-07T07:52:54+0000

You may try

library(dplyr)
df1 %>%
    mutate_each(funs(as.Date(., '%d/%m/%Y')), matches('start|end|date')) %>%
     mutate(drug.end= as.Date(ifelse(is.na(drug.end), End.date, 
                         drug.end),origin='1970-01-01'),
    Event= as.integer((Diag_date >= Drug.start & Diag_date<=drug.end) & 
                        !is.na(Diag_date))) #%>%
    #mutate_each(funs(format(., '%d/%m/%Y')), matches('start|end|date'))
 #    ID  Diag_date Treatment   End.date Drug.start   drug.end Event
 #1  1       <NA>         0 2002-03-15 2002-01-01 2002-02-01     0
 #2  1       <NA>         1 2002-03-15 2002-02-01 2002-03-01     0
 #3  1       <NA>         0 2002-03-15 2002-03-01 2002-03-15     0
 #4  2 2002-04-01         1 2002-05-01 2015-01-01 2002-02-01     0
 #5  2 2002-04-01         0 2002-05-01 2002-02-01 2002-03-01     0
 #6  2 2002-04-01         0 2002-05-01 2002-03-01 2002-05-01     1

As @David Arenburg pointed out, it is better to leave the "date" columns as the "Date" class. If you need it in symbol format, just uncomment the last line and run it.

NOTE. Removed group_by

as it was not necessary

How to show an event happened between two dates in R

More articles: