Find the start and end of consecutive values โโbelow the threshold
I have a named vector vec
containing observations for different dates:
vec <- c("20160101"=10,
"20160215"=35,
"20160315"=50,
"20160330"=75,
"20160410"=10,
"20160515"=60,
"20160605"=35,
"20160630"=30,
"20160725"=55,
"20160815"=28,
"20160905"=60,
"20161005"=80,
"20161115"=35,
"20161225"=15)
In the first step, I want to know how many runs are below a given threshold of 45 and have a minimum length of 2:
#threshold
thrs <- 45
#reclass and calculate runs
reclass <- vec
reclass[vec>thrs] <- 1
reclass[vec<=thrs] <- 0
runs <- rle(reclass)
below_thrs <- sum(runs$values[runs$length>=2] == 0)
> below_thrs
[1] 3
Now I want to find the start and end dates of these three runs. Expected Result:
1, 20160101, 20160215 2, 20160605, 20160630 3, 20161115, 20161225
Any help is greatly appreciated.
0
source to share
1 answer
vec<-c(vec, "Dummy"=-1) #add a dummy that takes a value that doesnt exist in the threshold, because runs$length has a blank col name for the last column
reclass <- c(vec)
reclass[vec>thrs] <- 1
reclass[vec<=thrs & vec>=0] <- 0 #be careful not to assign these categories to the dummy
runs <- rle(reclass)
and then just by looking at the template ...
> runs$lengths
20160315 20160410 20160515 20160605 20160725 20160815 20160905 20161115 Dummy
2 2 1 1 2 1 1 2 2 1
> runs$values
20160215 20160330 20160410 20160515 20160630 20160725 20160815 20161005 20161225 Dummy
0 1 0 1 0 1 0 1 0 -1
> (endingDates<-names(runs$values[runs$values==0 & runs$lengths >=2]))
[1] "20160215" "20160630" "20161225"
> (offset<-runs$lengths[which(names(runs$values) %in% endingDates)]-1)
20160315 20160725 Dummy
1 1 1
> (startingDates <- names(reclass)[which(names(reclass) %in% endingDates) - offset])
[1] "20160101" "20160605" "20161115"
+1
source to share