Find start and end positions / pass indices / sequential values
Problem: Given the atomic vector, find the starting and ending run indices in the vector.
An example of a vector with runs:
x = rev(rep(6:10, 1:5))
# [1] 10 10 10 10 10 9 9 9 9 8 8 8 7 7 6
Exit from rle()
:
rle(x)
# Run Length Encoding
# lengths: int [1:5] 5 4 3 2 1
# values : int [1:5] 10 9 8 7 6
Desired output:
# start end
# 1 1 5
# 2 6 9
# 3 10 12
# 4 13 14
# 5 15 15
The base class rle
does not provide this functionality, but the class rle
and the function rle2
. However, given how insignificant functionality is, sticking to the R base seems more sensible than installing and downloading additional packages.
There are code snippets examples ( here , here, and on SO ) that solve a slightly different problem of finding start and end indices for runs that satisfy some condition. I wanted something that was more general, could be done in one line, and not related to assigning temporary variables or values.
Answering my own question because I was disappointed with no search results. Hope this helps someone!
source to share
Basic logic:
# Example vector and rle object
x = rev(rep(6:10, 1:5))
rle_x = rle(x)
# Compute endpoints of run
end = cumsum(rle_x$lengths)
start = c(1, lag(end)[-1] + 1)
# Display results
data.frame(start, end)
# start end
# 1 1 5
# 2 6 9
# 3 10 12
# 4 13 14
# 5 15 15
Tidyverse / dplyr
way ( dplyr
data oriented ):
library(dplyr)
rle(x) %>%
unclass() %>%
as.data.frame() %>%
mutate(end = cumsum(lengths),
start = c(1, dplyr::lag(end)[-1] + 1)) %>%
magrittr::extract(c(1,2,4,3)) # To re-order start before end for display
Since start
both end
vectors have the same length as a values
component of the rle
object, solving the corresponding problem of determining the endpoints of the track, satisfying a certain condition is simple: filter
either a subset of start
and end
vectors, using the condition on the values โโof the run.
source to share