How to conditionally calculate the difference in column values ββbetween rows in R?
I have the following dataset
(this is just a sample, the actual dataset runs into rows)
Image of the dataset also added to the snapshot Dataset snapshot
User Time Flag TimeDifference Expected o/p (Seconds)
A 11:39:30 1
A 11:37:53 1
A 20:44:19 1
A 22:58:42 2 Calculate time difference? 8063
A 23:01:54 1 Calculate time difference? 192
B 23:03:00 1
B 23:03:33 1
B 23:03:53 1
B 15:00:42 3 Calculate time difference 28991
B 19:35:31 2 Calculate time difference 16489
B 19:35:34 1 Calculate time difference 3
C 10:19:06 1
C 10:59:50 1
C 10:59:50 1
C 12:16:36 1
C 12:16:36 1
I need to calculate for each user
-
time difference (in seconds) between rows whenever there is a "flag change" and store it in a new "Time Difference" column
-
i.e. whenever the flag changes from 1 to 2 or 2 to 3 or 2 to 1 or 3 to 1, I need to calculate the time column time difference between the current row and the previous row when the flag failed.
-
I have a time in hh: mm: ss format. Is there any loop function I can apply here?
Help rate.
source to share
One way to do this is to turn your temp variable into a POSIXlt time object, calculate the time difference (for all strings) relative to the shifted time variable. Then use the variable flag
on the NA
ones you don't need. The important part is you need to differentiate the variable flag
so that you know when your flag has changed.
I am laying out all the steps here, but there is probably a faster way to do it:
# Create the data
flag <- c(1,1,1,2,1,1,1,1,3,2,1,1,1,1,1,1)
time <- c('11:39:30','11:37:53','20:44:19','22:58:42','23:01:54',
'23:03:00','23:03:33','23:03:53','15:00:42','19:35:31',
'19:35:34','10:19:06','10:59:50','10:59:50','12:16:36',
'12:16:36')
# Shift the time
time_shift <- c(NA,time[1:length(time)-1])
# Turn into POSIXlt objects
time <- strptime(time, format='%H:%M:%S')
time_shift <- strptime(time_shift, format='%H:%M:%S')
data <- data.frame(time, time_shift, flag)
# Calculate diffs
data$time_diff <- as.numeric(abs(difftime(data$time, data$time_shift, units=('secs'))))
data$flag_diff <- c(NA,abs(diff(data$flag)))
# Set non 'flag change' diffs to NA
data$time_diff[data$flag_diff == 0] <- NA
You probably want to remove the useless columns and convert time
back to the original view, which you can do with
data$time <- format(data$time, "%H:%M:%S")
data <- data[c('time', 'flag', 'time_diff')]
This will make the dataframe look like this:
time flag time_diff
1 11:39:30 1 NA
2 11:37:53 1 NA
3 20:44:19 1 NA
4 22:58:42 2 8063
5 23:01:54 1 192
6 23:03:00 1 NA
7 23:03:33 1 NA
8 23:03:53 1 NA
9 15:00:42 3 28991
10 19:35:31 2 16489
11 19:35:34 1 3
12 10:19:06 1 NA
13 10:59:50 1 NA
14 10:59:50 1 NA
15 12:16:36 1 NA
16 12:16:36 1 NA
source to share
Some preprocessing may be needed before:
df$Time<-strptime(x = df$Time,format = "%H:%M:%S")
df$Time<-strftime(x = df$Time,format = "%H:%M:%S")
df$Time<-as.POSIXct(df$Time)
sol<-function(d){
Time_difference<-numeric(nrow(d))
ind<-which(diff(d$Flag)!=0)+1
#calculate differences in time where change in Flag was detected
Time_difference[ind]<-abs(difftime(time1 = d$Time[ind],time2 =
d$Time[(ind-1)], units = "secs"))
d$Time_Difference<-Time_difference
return(d)
}
Now use the function plyr
and ddply
which follow the split-apply-comb principle. It will take a data frame (d) and split it into a variable ("User" in this case), apply the function ( sol
in this case) to that subset of data.frame, and then recompile it into the original data.frame (d).
ddply(.data = df,.variables = "User",.fun = sol)
# User Time Flag Time_Difference
#1 A 11:39:30 1 0
#2 A 11:37:53 1 0
#3 A 20:44:19 1 0
#4 A 22:58:42 2 8063
#5 A 23:01:54 1 192
#6 B 23:03:00 1 0
#7 B 23:03:33 1 0
#8 B 23:03:53 1 0
#9 B 15:00:42 3 28991
#10 B 19:35:31 2 16489
#11 B 19:35:34 1 3
#12 C 10:19:06 1 0
#13 C 10:59:50 1 0
#14 C 10:59:50 1 0
#15 C 12:16:36 1 0
#16 C 12:16:36 1 0
source to share