Comparing dates within a group using one link

I have a datasheet for different patients ("Spell") and several temperature parameters ("Temp") for each patient ("Episode"). I also have the date and time at which each temperature was taken.

Spell Episode         Date    Temp
 1       3       2-1-17 21:00   40
 1       2       2-1-17 20:00   36
 1       1       1-1-17 10:00   37
 2       3       2-1-17 15:00   36
 2       2       2-1-17 10:00   37
 2       1       1-1-17 8:00    36
 3       1       3-1-17 10:00   40
 4       3       4-1-17 15:00   36
 4       2       3-1-17 12:00   40
 4       1       3-1-17 10:00   39
 5       7       3-1-17 17:30   36
 5       6       2-1-17 17:00   36
 5       5       2-1-17 16:00   37
 5       1       1-1-17 9:00    36
 5       4       1-1-17 14:00   39
 5       3       1-1-17 13:00   40
 5       2       1-1-17 11:00   39

      

I am interested in taking all measurements 24 hours before the last one, I have grouped observations with a spell and a back date, but I am not sure how to do a group comparison using the same link (in this case, the first line for each groups). The result should be:

    Spell Episode         Date    Temp
 1       3       2-1-17 21:00   40
 1       2       2-1-17 20:00   36
 2       3       2-1-17 15:00   36
 2       2       2-1-17 10:00   37
 3       1       3-1-17 10:00   40
 4       3       4-1-17 15:00   36
 5       7       3-1-17 17:30   36

      

Any ideas would be needed to point me in the right direction.

Edit: The date is in the format dm-yy H: M. Here's the dput from the data:

structure(list(Spell = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L, 
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), Episode = c(3L, 2L, 1L, 3L, 
2L, 1L, 1L, 3L, 2L, 1L, 7L, 6L, 5L, 1L, 4L, 3L, 2L), Date = c("2-1-17 21:00", 
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00", 
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00", 
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00", 
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
), Temp = c(40L, 36L, 37L, 36L, 37L, 36L, 40L, 36L, 40L, 39L, 
36L, 36L, 37L, 36L, 39L, 40L, 39L)), .Names = c("Spell", "Episode", 
"Date", "Temp"), class = c("data.table", "data.frame"), row.names = c(NA, 
-17L), .internal.selfref = <pointer: 0x00000000001f0788>)

      

+3


source to share


4 answers


library(dplyr)

df %>% 
  mutate(Date2 = as.numeric(strptime(df$Date, "%d-%m-%Y %H:%M"))) %>% 
  group_by(Spell) %>% 
  filter(Date2 >= (max(Date2) - 60*60*24)) %>%
  select(-Date2)

      



+6


source


Solution using only data.table

:



# convert Date column to POSIXct
DT[,Date:=as.POSIXct(Date,format='%d-%m-%y %H:%M',tz='GMT')]
# filter the data.table
filteredDT <- DT[, .SD[as.numeric(difftime(max(Date),Date,units='hours')) <= 24], by = Spell]

> filteredDT
   Spell Episode                Date Temp
1:     1       3 2017-01-02 21:00:00   40
2:     1       2 2017-01-02 20:00:00   36
3:     2       3 2017-01-02 15:00:00   36
4:     2       2 2017-01-02 10:00:00   37
5:     3       1 2017-01-03 10:00:00   40
6:     4       3 2017-01-04 15:00:00   36
7:     5       7 2017-01-03 17:30:00   36

      

+5


source


mydata$Date <- as.POSIXct(mydata$Date, format = '%d-%m-%y %H:%M', tz='GMT')
mydata <- mydata[with(mydata, order(Spell, -as.numeric(Date))),]
index <- with(mydata, tapply(Date, Spell, function(x){x >= max(x) - as.difftime(1, unit="days")}))
mydata[unlist(index),]

    Spell Episode                Date Temp
1:      1       3 2017-01-02 21:00:00   40
2:      1       2 2017-01-02 20:00:00   36
4:      2       3 2017-01-02 15:00:00   36
5:      2       2 2017-01-02 10:00:00   37
7:      3       1 2017-01-03 10:00:00   40
8:      4       3 2017-01-04 15:00:00   36
11:     5       7 2017-01-03 17:30:00   36

      

+2


source


The solution below uses two functions from the Hadley Wickham package lubridate()

. This package is very handy when dealing with dates and times, so I wonder why it hasn't been used in any of the other answers.

Also data.table

used because OP provided sample class data data.table

.

library(data.table)   # if not already loaded
# coerce Date to POSIXct
DT[, Date := lubridate::dmy_hm(Date)][
  # for each, pick measurements within last 24 hours
  , .SD[Date > max(Date) - lubridate::dhours(24L)], by = Spell][
    # order, just for convenience
    order(Spell, -Date)]

      

   Spell Episode                Date Temp
1:     1       3 2017-01-02 21:00:00   40
2:     1       2 2017-01-02 20:00:00   36
3:     2       3 2017-01-02 15:00:00   36
4:     2       2 2017-01-02 10:00:00   37
5:     3       1 2017-01-03 10:00:00   40
6:     4       3 2017-01-04 15:00:00   36
7:     5       7 2017-01-03 17:30:00   36

      

Note that the expected output as given by the OP shows an extra line (Spell 5, Episode 6) that is outside of the 24 hour window.

Data

As provided by the OP

DT <- structure(list(Spell = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L, 
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), Episode = c(3L, 2L, 1L, 3L, 
2L, 1L, 1L, 3L, 2L, 1L, 7L, 6L, 5L, 1L, 4L, 3L, 2L), Date = c("2-1-17 21:00", 
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00", 
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00", 
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00", 
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
), Temp = c(40L, 36L, 37L, 36L, 37L, 36L, 40L, 36L, 40L, 39L, 
36L, 36L, 37L, 36L, 39L, 40L, 39L)), .Names = c("Spell", "Episode", 
"Date", "Temp"), class = c("data.table", "data.frame"), row.names = c(NA, -17L))

      

+1


source







All Articles