Find start and end positions / pass indices / sequential values

Problem: Given the atomic vector, find the starting and ending run indices in the vector.

An example of a vector with runs:

x = rev(rep(6:10, 1:5))
# [1] 10 10 10 10 10  9  9  9  9  8  8  8  7  7  6

      

Exit from rle()

:

rle(x)
# Run Length Encoding
#  lengths: int [1:5] 5 4 3 2 1
#  values : int [1:5] 10 9 8 7 6

      

Desired output:

#   start end
# 1     1   5
# 2     6   9
# 3    10  12
# 4    13  14
# 5    15  15

      

The base class rle

does not provide this functionality, but the class rle

and the function rle2

. However, given how insignificant functionality is, sticking to the R base seems more sensible than installing and downloading additional packages.

There are code snippets examples ( here , here, and on SO ) that solve a slightly different problem of finding start and end indices for runs that satisfy some condition. I wanted something that was more general, could be done in one line, and not related to assigning temporary variables or values.

Answering my own question because I was disappointed with no search results. Hope this helps someone!

+8


source to share


2 answers


Basic logic:

# Example vector and rle object
x = rev(rep(6:10, 1:5))
rle_x = rle(x)

# Compute endpoints of run
end = cumsum(rle_x$lengths)
start = c(1, lag(end)[-1] + 1)

# Display results
data.frame(start, end)
#   start end
# 1     1   5
# 2     6   9
# 3    10  12
# 4    13  14
# 5    15  15

      

Tidyverse / dplyr

way ( dplyr

data oriented ):



library(dplyr)

rle(x) %>%
  unclass() %>%
  as.data.frame() %>%
  mutate(end = cumsum(lengths),
         start = c(1, dplyr::lag(end)[-1] + 1)) %>%
  magrittr::extract(c(1,2,4,3)) # To re-order start before end for display

      

Since start

both end

vectors have the same length as a values

component of the rle

object, solving the corresponding problem of determining the endpoints of the track, satisfying a certain condition is simple: filter

either a subset of start

and end

vectors, using the condition on the values โ€‹โ€‹of the run.

+12


source


Opportunity data.table

where .I

u .N

are used to select the appropriate indices for each group defined rleid

.



library(data.table)
data.table(x)[ , .(start = .I[1], end = .I[.N]), by = rleid(x)][, rleid := NULL][]
#    start end
# 1:     1   5
# 2:     6   9
# 3:    10  12
# 4:    13  14
# 5:    15  15

      

+4


source







All Articles