How to show an event happened between two dates in R
My question seems simple, and I hope it is.
I have a date chart that has the date the disease was diagnosed, a dummy that indicates which drug was included in the patient (or exposed and unexposed group), the drug start and stop date, and the total stop date.
ID Diag_date Treatment End.date Drug.start drug.end
1 NA 0 15/03/2002 01/01/2002 01/02/2002
1 NA 1 15/03/2002 01/02/2002 01/03/2002
1 NA 0 15/03/2002 01/03/2002 NA
2 01/04/2002 1 01/05/2002 01/01/2015 01/02/2002
2 01/04/2002 0 01/05/2002 01/02/2002 01/03/2002
2 01/04/2002 0 01/05/2002 01/03/2002 NA
As you can see, the date of the diagnosis does not change over time, but the start and end dates of the medication intake.
It is desirable that I answer two questions:
1.) How to transfer common End.date
to final drug.end
for everyone ID
?
2.) How do I create a binary column that shows if a diagnostic date exists between Drug.start
and drug.end
?
I want my latest data to look like this:
ID Diag_date Treatment End.Date Drug.start Drug.end Event
1 NA 0 15/03/2002 01/01/2002 01/02/2002 0
1 NA 1 15/03/2002 01/02/2002 01/03/2002 0
1 NA 0 15/03/2002 01/03/2002 15/03/2002 0
2 01/04/2002 1 01/05/2002 01/01/2015 01/02/2002 0
2 01/04/2002 0 01/05/2002 01/02/2002 01/03/2002 0
2 01/04/2002 0 01/05/2002 01/03/2002 01/05/2002 1
Not everyone has a diagnosis date because not everyone in the sample had the disease. The code I wrote is the following:
for (i in 1:nrow(df)) {
if ((df$Diag_date[i] >= df$Drug.start[i]) && ( df$Diag_date[i] <= df$Drug.stop[i])) {
df$Event[i] <- 1
} else {
df$Event[i] <- 0
}
}
the error i get when running this code:
missing value where TRUE/FALSE needed
Any help would be much appreciated.
source to share
You may try
library(dplyr)
df1 %>%
mutate_each(funs(as.Date(., '%d/%m/%Y')), matches('start|end|date')) %>%
mutate(drug.end= as.Date(ifelse(is.na(drug.end), End.date,
drug.end),origin='1970-01-01'),
Event= as.integer((Diag_date >= Drug.start & Diag_date<=drug.end) &
!is.na(Diag_date))) #%>%
#mutate_each(funs(format(., '%d/%m/%Y')), matches('start|end|date'))
# ID Diag_date Treatment End.date Drug.start drug.end Event
#1 1 <NA> 0 2002-03-15 2002-01-01 2002-02-01 0
#2 1 <NA> 1 2002-03-15 2002-02-01 2002-03-01 0
#3 1 <NA> 0 2002-03-15 2002-03-01 2002-03-15 0
#4 2 2002-04-01 1 2002-05-01 2015-01-01 2002-02-01 0
#5 2 2002-04-01 0 2002-05-01 2002-02-01 2002-03-01 0
#6 2 2002-04-01 0 2002-05-01 2002-03-01 2002-05-01 1
As @David Arenburg pointed out, it is better to leave the "date" columns as the "Date" class. If you need it in symbol format, just uncomment the last line and run it.
NOTE. Removed group_by
as it was not necessary
source to share
Its possible equivalent data.table
library(data.table)
# Converting to dates
Dates <- names(df)[c(2, 4:6)]
setDT(df)[, (Dates) := lapply(.SD, as.IDate, format = "%d/%m/%Y"), .SDcols = Dates]
# First question
df[is.na(drug.end), drug.end := End.date]
# Second question
df[Diag_date >= Drug.start & Diag_date <= drug.end, Event := 1L]
source to share
Akrun's answer is enough for the problem. By offering more direct code.
A <- read.table("clipboard", header = T)
Dates <- c("Diag_date", "End.date", "Drug.start", "drug.end")
A[,Dates] <- lapply(A[,Dates],function(x) as.Date(x, format = "%d/%m/%Y"))
A$drug.end[is.na(A$drug.end)] <- as.character(A$End.date[is.na(A$drug.end)])
A$Event <- as.numeric((A$Diag_date >= A$Drug.start & A$Diag_date<=A$drug.end) & !is.na(A$Diag_date))
source to share