R - delete "one by one" duplicates

Question

R - delete "one by one" duplicates

I am trying to find a way to remove lucky duplicates in R language. I have an object zoo

like:

2015-01-01 12:00:00    1
2015-01-01 13:00:00    1
2015-01-01 14:00:00    1
2015-01-01 15:30:00    4
2015-01-01 16:00:00    1
2015-01-01 17:00:00    6

and my expected output:

2015-01-01 12:00:00    1
2015-01-01 15:30:00    4
2015-01-01 16:00:00    1
2015-01-01 17:00:00    6

When I use a duplicate function, it removes duplicates (1) also when they don't appear sequentially.

Can anyone give me a hint how to write this or if there is a function already available?

+3

r duplicates zoo

HansHupe May 13 '15 at 10:39

source to share

2 answers

Using dplyr and lubridate, you can do it like this:

library(dplyr)
library(lubridate)

DF <- data.frame(Date=c("2015-01-01 12:00:00",
                        "2015-01-01 13:00:00","2015-01-01 15:30:00"),
                 name1=c(1, 1, 4))

DF %>%
  mutate(Date = ymd_hms(as.character(Date))) %>%
  filter(Date - hours(1) > lag(Date) | is.na(lag(Date)))

dplyr

allows you to refer to the line above ( lag

) and lubridate

allows you to calculate with dates.

+2

pfuhlert May 13 '15 at 11:09

source to share

James · Accepted Answer · 2015-05-13T10:49:28+0000

You can use the run length encoding length to select the strings you want. If raw is used in cumsum

, it will give you the last value in the sequence, but you can get the first by subtracting the lengths from the total and adding one.

x <- data.frame(Date=Sys.Date()+0:5,Value=c(1,1,1,4,1,6))
lens <- rle(x$Value)$lengths
select <- cumsum(lens)-lens+1
x[select,]
        Date Value
1 2015-05-13     1
4 2015-05-16     4
5 2015-05-17     1
6 2015-05-18     6

R - delete "one by one" duplicates

More articles: