Getting the index of the row where the new value starts

Question

Getting the index of the row where the new value starts

I have a simple data.frame

one as shown below. I want to get all the indices of the rows where the new one starts origin

. In this case it would be 1, 5, and 8. Is there a way to do this without a loop?

df <- data.frame(origin=c(rep('2016-01-01', 4), rep('2016-02-01',3), rep('2016-03-01',2)), 
  date=c('2016-01-01','2016-02-01','2016-03-01','2016-04-01','2016-02-01','2016-03-01','2016-04-01','2016-03-01','2016-04-01'),
  val=rnorm(9))

df$date <- as.Date(df$date)
df$origin <- as.Date(df$origin)

df
      origin       date        val
1 2016-01-01 2016-01-01 -2.0856573
2 2016-01-01 2016-02-01 -0.5930160
3 2016-01-01 2016-03-01  0.5370460
4 2016-01-01 2016-04-01  1.5539720
5 2016-02-01 2016-02-01  0.4866211
6 2016-02-01 2016-03-01 -0.1443780
7 2016-02-01 2016-04-01 -0.9286197
8 2016-03-01 2016-03-01 -0.6311255
9 2016-03-01 2016-04-01  1.1667005

+3

r

Gaurav bansal Apr 07 17 at 21:33

source to share

3 answers

Another option using rle

and cumsum

. We c()

a 1

at the beginning because this is the beginning, and then we remove the last element from the loop (since there are no new elements after it). A bit esoteric, but:

date_runs <- rle(as.character(df$origin))
cumsum(c(1,date_runs[[1]][-length(date_runs[[1]])]))
##[1] 1 5 8

+1

Mike H. Apr 07 17 at 22:44

source to share

You can use functions in a package dplyr

:

library(dplyr)
df %>%
  group_by(origin) %>%
  slice(1)

0

johnckane Apr 07 17 at 21:38

source to share

db · Accepted Answer · 2017-04-07T21:35:40+0000

which(!duplicated(df$origin))
#[1] 1 5 8

If the values may be repeated (or not sorted), use the following information to find where the series of new values begins.

which(c(TRUE, as.character(df$origin)[-NROW(df)] != as.character(df$origin)[-1]))
#[1] 1 5 8

Getting the index of the row where the new value starts

More articles: