R comparing the values ​​in the previous lines

I have data like this:

Incident.ID.. = c(rep("INCFI0000029582",4), rep("INCFI0000029587",4))
date = c("2014-09-25 08:39:45", "2014-09-25 08:39:48", "2014-09-25 08:40:44", "2014-10-10 23:04:00", "2014-09-25 08:33:32", "2014-09-25 08:34:41", "2014-09-25 08:35:24", "2014-10-10 23:04:00")
status = c("assigned", "in.progress", "resolved", "closed", "assigned", "resolved", "resolved", "closed")
date.diff=c (3, 56, 1347796,0 ,69 ,43, 1348116, 0)
df = data.frame(Incident.ID..,date, status, date.diff, stringsAsFactors = FALSE)

df
    Incident.ID..                date      status date.diff
1 INCFI0000029582 2014-09-25 08:39:45    assigned         3
2 INCFI0000029582 2014-09-25 08:39:48 in.progress        56
3 INCFI0000029582 2014-09-25 08:40:44    resolved   1347796
4 INCFI0000029582 2014-10-10 23:04:00      closed         0
5 INCFI0000029587 2014-09-25 08:33:32    assigned        69
6 INCFI0000029587 2014-09-25 08:34:41    resolved        43
7 INCFI0000029587 2014-09-25 08:35:24    resolved   1348116
8 INCFI0000029587 2014-10-10 23:04:00      closed         0

      

And I would like to select only rows with "resolved" status for a specific incident .ID .. when it is not followed by the status of the same incident .ID .. "closed" (there can be only rows with "allowed" or only "closed", so that somehow Incident.ID .. should be the same when comparing).

For example, here in the data for this example, only this line will be selected:

6 INCFI0000029587 2014-09-25 08:34:41    resolved        43 

      

So how can I do this?

+3


source to share


2 answers


Here's a simple approach using dplyr to group the data using the incident id and then filter (select rows) using the "lead" function to find the next row:



library(dplyr)
df %>%
  group_by(Incident.ID..) %>%
  filter(status == "resolved" & lead(status) != "closed")  # you can add %>% ungroup() if required
#Source: local data frame [1 x 4]
#Groups: Incident.ID..
#
#    Incident.ID..                date   status date.diff
#1 INCFI0000029587 2014-09-25 08:34:41 resolved        43

      

+3


source


library(data.table) #using the development version of data.table
setDT(df)[, .SD[status == "resolved" & shift(status, type = "lead") != "closed"], by = Incident.ID..]
     Incident.ID..                date   status date.diff
1: INCFI0000029587 2014-09-25 08:34:41 resolved        43

      



PS updated as per @David comment

+3


source







All Articles