Create a sliding index of pairs over groups

I need to create (with R) a sliding index of pairs from a dataset that includes groups. Consider the following dataset:

times <- c(4,3,2)
V1 <- unlist(lapply(times, function(x) seq(1, x)))
df <- data.frame(group = rep(1:length(times), times = times), 
                 V1 = V1, 
                 rolling_index = c(1,1,2,2,3,3,4,5,5))

df
  group V1 rolling_index
1     1  1             1
2     1  2             1
3     1  3             2
4     1  4             2
5     2  1             3
6     2  2             3
7     2  3             4
8     3  1             5
9     3  2             5

      

In a dataframe, I have a group of variables and a V1. Within each group, V1 denotes the current index (which may or may not start at 1).

I want to create a new indexing variable that looks like roll_index. This variable groups rows within one group and sequential V1 value, thus creating a new rolling index. This new index must be consistent across groups. If there is an uneven number of rows in a group (i.e. group 2), then the last, one row gets its own rolling index value.

+3


source to share


2 answers


You may try

library(data.table)
setDT(df)[,  gr:=as.numeric(gl(.N, 2, .N)), group][, 
     rollindex:=cumsum(c(TRUE,abs(diff(gr))>0))][,gr:= NULL]
#    group V1 rolling_index rollindex
#1:     1  1             1         1
#2:     1  2             1         1
#3:     1  3             2         2
#4:     1  4             2         2
#5:     2  1             3         3
#6:     2  2             3         3
#7:     2  3             4         4
#8:     3  1             5         5
#9:     3  2             5         5

      

Or using base R

 indx1 <- !duplicated(df$group)
 indx2 <- with(df, ave(group, group, FUN=function(x)
                           gl(length(x), 2, length(x))))
 cumsum(c(TRUE,diff(indx2)>0)|indx1)
 #[1] 1 1 2 2 3 3 4 5 5

      

Update



The above methods are based on the "group" column. Assuming you already have a sequence column ('V1') by group as shown in the example, creating an accelerated index is easier

 cumsum(!!df$V1 %%2)
 #[1] 1 1 2 2 3 3 4 5 5

      

As mentioned in the post, if the "V1" column does not start with "1" for some groups, we can get the sequence from the "group" and then do cumsum

as above

 cumsum(!!with(df, ave(seq_along(group), group, FUN=seq_along))%%2)
 #[1] 1 1 2 2 3 3 4 5 5

      

+5


source


There is probably an easier way, but you can do:



rep_each <- unlist(mapply(function(q,r) {c(rep(2, q),rep(1, r))}, 
                          q=table(df$group)%/%2,
                          r=table(df$group)%%2))

df$rolling_index <- inverse.rle(x=list(lengths=rep_each, values=seq(rep_each)))

df$rolling_index 
#[1] 1 1 2 2 3 3 4 5 5

      

+5


source







All Articles