Deleting rows with multiple records in R Dataframe without loop

I am working with a data frame that measures writes and outputs. Sometimes the entry criteria are met several times before the exit criterion occurs and vice versa. I would like to delete these instances. For example, in the data there is a record on line 1, and another on line 2, the second on line 2 should appear, since we are already "B". Likewise, lines 6 and 7 should appear since we are already "out" and there was no other entry. It is also worth mentioning that there can be no exit without an appointment.

I know I can do this with a for loop, but I would like to avoid it if possible. I've tried using cumsum to filter for the total "In" + "Out" of 0 or 1, everything else should be done. This approach doesn't work.

Original data frame:

   In Out
1   1   0
2   1   0
3   0  -1
4   1   0
5   0  -1
6   0  -1
7   0  -1
8   1   0
9   0  -1
10  0  -1

      

Desired output:

   In Out
1   1   0
3   0  -1
4   1   0
5   0  -1
8   1   0
9   0  -1

      

Code to create the original dataframe:

temp <- structure(list(In = c(1, 1, 0, 1, 0, 0, 0, 1, 0, 0), Out = c(0, 
0, -1, 0, -1, -1, -1, 0, -1, -1)), .Names = c("In", "Out"), row.names = c(NA, 
10L), class = "data.frame")

      

Thank you for your help.

+3


source to share


3 answers


Try

 library(data.table)#v1.9.5+
 setDT(temp)[, ind:=rleid(Out)][,.SD[1L] , by = ind][, ind:=NULL][]
 #    In Out
 #1:  1   0
 #2:  0  -1
 #3:  1   0
 #4:  0  -1
 #5:  1   0
 #6:  0  -1

      



Or based on @Arun's comment

 setDT(temp)[, .SD[1L], by = list(ind=rleid(Out)), .SDcols=1:2][,ind:= NULL][]

      

+3


source


Here's another solution:



temp[c(TRUE,temp$In[-length(temp$In)]!=temp$In[-1]),]

      

+3


source


Simple solution data.table

. Not necessaryv1.9.5

setDT(temp)[c( TRUE , In[-.N] != In[-1] )]

      

0


source







All Articles