Comparing dates within a group using one link
I have a datasheet for different patients ("Spell") and several temperature parameters ("Temp") for each patient ("Episode"). I also have the date and time at which each temperature was taken.
Spell Episode Date Temp
1 3 2-1-17 21:00 40
1 2 2-1-17 20:00 36
1 1 1-1-17 10:00 37
2 3 2-1-17 15:00 36
2 2 2-1-17 10:00 37
2 1 1-1-17 8:00 36
3 1 3-1-17 10:00 40
4 3 4-1-17 15:00 36
4 2 3-1-17 12:00 40
4 1 3-1-17 10:00 39
5 7 3-1-17 17:30 36
5 6 2-1-17 17:00 36
5 5 2-1-17 16:00 37
5 1 1-1-17 9:00 36
5 4 1-1-17 14:00 39
5 3 1-1-17 13:00 40
5 2 1-1-17 11:00 39
I am interested in taking all measurements 24 hours before the last one, I have grouped observations with a spell and a back date, but I am not sure how to do a group comparison using the same link (in this case, the first line for each groups). The result should be:
Spell Episode Date Temp
1 3 2-1-17 21:00 40
1 2 2-1-17 20:00 36
2 3 2-1-17 15:00 36
2 2 2-1-17 10:00 37
3 1 3-1-17 10:00 40
4 3 4-1-17 15:00 36
5 7 3-1-17 17:30 36
Any ideas would be needed to point me in the right direction.
Edit: The date is in the format dm-yy H: M. Here's the dput from the data:
structure(list(Spell = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), Episode = c(3L, 2L, 1L, 3L,
2L, 1L, 1L, 3L, 2L, 1L, 7L, 6L, 5L, 1L, 4L, 3L, 2L), Date = c("2-1-17 21:00",
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00",
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00",
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00",
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
), Temp = c(40L, 36L, 37L, 36L, 37L, 36L, 40L, 36L, 40L, 39L,
36L, 36L, 37L, 36L, 39L, 40L, 39L)), .Names = c("Spell", "Episode",
"Date", "Temp"), class = c("data.table", "data.frame"), row.names = c(NA,
-17L), .internal.selfref = <pointer: 0x00000000001f0788>)
source to share
Solution using only data.table
:
# convert Date column to POSIXct
DT[,Date:=as.POSIXct(Date,format='%d-%m-%y %H:%M',tz='GMT')]
# filter the data.table
filteredDT <- DT[, .SD[as.numeric(difftime(max(Date),Date,units='hours')) <= 24], by = Spell]
> filteredDT
Spell Episode Date Temp
1: 1 3 2017-01-02 21:00:00 40
2: 1 2 2017-01-02 20:00:00 36
3: 2 3 2017-01-02 15:00:00 36
4: 2 2 2017-01-02 10:00:00 37
5: 3 1 2017-01-03 10:00:00 40
6: 4 3 2017-01-04 15:00:00 36
7: 5 7 2017-01-03 17:30:00 36
source to share
mydata$Date <- as.POSIXct(mydata$Date, format = '%d-%m-%y %H:%M', tz='GMT')
mydata <- mydata[with(mydata, order(Spell, -as.numeric(Date))),]
index <- with(mydata, tapply(Date, Spell, function(x){x >= max(x) - as.difftime(1, unit="days")}))
mydata[unlist(index),]
Spell Episode Date Temp
1: 1 3 2017-01-02 21:00:00 40
2: 1 2 2017-01-02 20:00:00 36
4: 2 3 2017-01-02 15:00:00 36
5: 2 2 2017-01-02 10:00:00 37
7: 3 1 2017-01-03 10:00:00 40
8: 4 3 2017-01-04 15:00:00 36
11: 5 7 2017-01-03 17:30:00 36
source to share
The solution below uses two functions from the Hadley Wickham package lubridate()
. This package is very handy when dealing with dates and times, so I wonder why it hasn't been used in any of the other answers.
Also data.table
used because OP provided sample class data data.table
.
library(data.table) # if not already loaded
# coerce Date to POSIXct
DT[, Date := lubridate::dmy_hm(Date)][
# for each, pick measurements within last 24 hours
, .SD[Date > max(Date) - lubridate::dhours(24L)], by = Spell][
# order, just for convenience
order(Spell, -Date)]
Spell Episode Date Temp
1: 1 3 2017-01-02 21:00:00 40
2: 1 2 2017-01-02 20:00:00 36
3: 2 3 2017-01-02 15:00:00 36
4: 2 2 2017-01-02 10:00:00 37
5: 3 1 2017-01-03 10:00:00 40
6: 4 3 2017-01-04 15:00:00 36
7: 5 7 2017-01-03 17:30:00 36
Note that the expected output as given by the OP shows an extra line (Spell 5, Episode 6) that is outside of the 24 hour window.
Data
As provided by the OP
DT <- structure(list(Spell = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), Episode = c(3L, 2L, 1L, 3L,
2L, 1L, 1L, 3L, 2L, 1L, 7L, 6L, 5L, 1L, 4L, 3L, 2L), Date = c("2-1-17 21:00",
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00",
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00",
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00",
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
), Temp = c(40L, 36L, 37L, 36L, 37L, 36L, 40L, 36L, 40L, 39L,
36L, 36L, 37L, 36L, 39L, 40L, 39L)), .Names = c("Spell", "Episode",
"Date", "Temp"), class = c("data.table", "data.frame"), row.names = c(NA, -17L))
source to share