Encoding the total length of the data frame in R
I have a data frame that contains values related to observations, 1 or 0. I want to count continuous occurrences of 1 by resetting to 0. The run length encoding ( rle
) function seems to be as if it were doing but I cannot handle receiving data in desired format. I want to try doing this without writing a custom function. In the data below I have an observation on a dataframe, then I want to get a "persistent" column and write back to the dataframe. This link was a good start .
observation continual
0 0
0 0
0 0
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
1 11
1 12
0 0
0 0
source to share
You can do this quite easily in a couple of steps:
x <- rle(mydf$observation) ## run rle on the relevant column
new <- sequence(x$lengths) ## create a sequence of the lengths values
new[mydf$observation == 0] <- 0 ## replace relevant values with zero
new
# [1] 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 0 0
source to share
Using devel version you can try
library(data.table) ## v >= 1.9.5
setDT(df)[, continual := seq_len(.N) * observation, by = rleid(observation)]
source to share
You can use a simple basic R
one liner using a fact observation containing only 0
and 1
, combined with a vectorized operation:
transform(df, continual=ifelse(observation, cumsum(observation), observation))
# observation continual
#1 0 0
#2 0 0
#3 0 0
#4 1 1
#5 1 2
#6 1 3
#7 1 4
#8 1 5
#9 1 6
#10 1 7
#11 1 8
#12 1 9
#13 1 10
#14 1 11
#15 1 12
#16 0 0
#17 0 0
source to share