How to conditionally calculate the difference in column values ​​between rows in R?

I have the following dataset

(this is just a sample, the actual dataset runs into rows)

Image of the dataset also added to the snapshot Dataset snapshot

User    Time    Flag    TimeDifference     Expected o/p (Seconds)
A   11:39:30    1       
A   11:37:53    1       
A   20:44:19    1       
A   22:58:42    2   Calculate time difference?  8063
A   23:01:54    1   Calculate time difference?  192
B   23:03:00    1       
B   23:03:33    1       
B   23:03:53    1       
B   15:00:42    3   Calculate time difference   28991
B   19:35:31    2   Calculate time difference   16489
B   19:35:34    1   Calculate time difference   3
C   10:19:06    1       
C   10:59:50    1       
C   10:59:50    1       
C   12:16:36    1       
C   12:16:36    1       

      

I need to calculate for each user

  • time difference (in seconds) between rows whenever there is a "flag change" and store it in a new "Time Difference" column

  • i.e. whenever the flag changes from 1 to 2 or 2 to 3 or 2 to 1 or 3 to 1, I need to calculate the time column time difference between the current row and the previous row when the flag failed.

  • I have a time in hh: mm: ss format. Is there any loop function I can apply here?

Help rate.

+3


source to share


2 answers


One way to do this is to turn your temp variable into a POSIXlt time object, calculate the time difference (for all strings) relative to the shifted time variable. Then use the variable flag

on the NA

ones you don't need. The important part is you need to differentiate the variable flag

so that you know when your flag has changed.

I am laying out all the steps here, but there is probably a faster way to do it:

# Create the data
flag <- c(1,1,1,2,1,1,1,1,3,2,1,1,1,1,1,1)
time <- c('11:39:30','11:37:53','20:44:19','22:58:42','23:01:54',
          '23:03:00','23:03:33','23:03:53','15:00:42','19:35:31',
          '19:35:34','10:19:06','10:59:50','10:59:50','12:16:36',
          '12:16:36')

# Shift the time
time_shift <- c(NA,time[1:length(time)-1])

# Turn into POSIXlt objects
time <- strptime(time, format='%H:%M:%S')
time_shift <- strptime(time_shift, format='%H:%M:%S')

data <- data.frame(time, time_shift, flag)

# Calculate diffs
data$time_diff <- as.numeric(abs(difftime(data$time, data$time_shift, units=('secs'))))
data$flag_diff <- c(NA,abs(diff(data$flag)))

# Set non 'flag change' diffs to NA
data$time_diff[data$flag_diff == 0] <- NA

      

You probably want to remove the useless columns and convert time

back to the original view, which you can do with



data$time <- format(data$time, "%H:%M:%S")
data <- data[c('time', 'flag', 'time_diff')]

      

This will make the dataframe look like this:

       time flag time_diff
1  11:39:30    1        NA
2  11:37:53    1        NA
3  20:44:19    1        NA
4  22:58:42    2      8063
5  23:01:54    1       192
6  23:03:00    1        NA
7  23:03:33    1        NA
8  23:03:53    1        NA
9  15:00:42    3     28991
10 19:35:31    2     16489
11 19:35:34    1         3
12 10:19:06    1        NA
13 10:59:50    1        NA
14 10:59:50    1        NA
15 12:16:36    1        NA
16 12:16:36    1        NA

      

0


source


Some preprocessing may be needed before:

df$Time<-strptime(x = df$Time,format = "%H:%M:%S")
df$Time<-strftime(x = df$Time,format = "%H:%M:%S")
df$Time<-as.POSIXct(df$Time)

sol<-function(d){
    Time_difference<-numeric(nrow(d))
    ind<-which(diff(d$Flag)!=0)+1

    #calculate differences in time where change in Flag was detected
    Time_difference[ind]<-abs(difftime(time1 = d$Time[ind],time2 = 
    d$Time[(ind-1)], units = "secs"))
    d$Time_Difference<-Time_difference
    return(d)
   }

      



Now use the function plyr

and ddply

which follow the split-apply-comb principle. It will take a data frame (d) and split it into a variable ("User" in this case), apply the function ( sol

in this case) to that subset of data.frame, and then recompile it into the original data.frame (d).

ddply(.data = df,.variables = "User",.fun = sol)

#    User     Time  Flag Time_Difference
#1     A  11:39:30    1               0
#2     A  11:37:53    1               0
#3     A  20:44:19    1               0
#4     A  22:58:42    2            8063
#5     A  23:01:54    1             192
#6     B  23:03:00    1               0
#7     B  23:03:33    1               0
#8     B  23:03:53    1               0
#9     B  15:00:42    3           28991
#10    B  19:35:31    2           16489
#11    B  19:35:34    1               3
#12    C  10:19:06    1               0
#13    C  10:59:50    1               0 
#14    C  10:59:50    1               0
#15    C  12:16:36    1               0
#16    C  12:16:36    1               0

      

0


source







All Articles