Read consecutive occurrences of a specific value into each row of a data frame in R

I have data.frame

monthly variable values ​​for many locations (so many lines) and I want to count the number of consecutive months (like consecutive cells) that have a value of 0. It would be easy if it were just read from left to right, but added complication is that the end of the year is consistent with the beginning of the year.

For example, in the abbreviated example dataset below (with seasons, not months), location 1 has 3 '0' months, location 2 has 2, and location 3 does not.

df<-cbind(location= c(1,2,3),
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))

      

How can I count these consecutive zero values? I looked at rle

, but I am still not wiser!

Thanks a lot for any help :)

+3


source to share


2 answers


You've identified two cases in which the longest run can take place: (1) somewhere inside the middle, or (2) split between the end and start of each line. Hence, you want to calculate each condition and take max like so:

df<-cbind(
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))

#>      Winter Spring Summer Autumn
#> [1,]      0      0      0      3
#> [2,]      0      2      2      0
#> [3,]      3      4      7      4


# calculate the number of consecutive zeros at the start and end
startZeros  <-  apply(df,1,function(x)which.min(x==0)-1)
#> [1] 3 1 0
endZeros  <-  apply(df,1,function(x)which.min(rev(x==0))-1)
#> [1] 0 1 0

# calculate the longest run of zeros
longestRun  <-  apply(df,1,function(x){
                y = rle(x);
                max(y$lengths[y$values==0],0)}))
#> [1] 3 1 0

# take the max of the two values
pmax(longestRun,startZeros +endZeros  )
#> [1] 3 2 0

      

Of course, it's even easier:



longestRun  <-  apply(cbind(df,df),# tricky way to wrap the zeros from the start to the end
                      1,# the margin over which to apply the summary function
                      function(x){# the summary function
                          y = rle(x);
                          max(y$lengths[y$values==0],
                              0)#include zero incase there are no zeros in y$values
                      })

      

Note that this solution works because mine df

does not include the field location

(column).

+2


source


Try the following:

df <- data.frame(location = c(1, 2, 3),
                 Winter = c(0, 0, 3),
                 Spring = c(0, 2, 4),
                 Summer = c(0, 2, 7),
                 Autumn = c(3, 0, 4))

maxcumzero <- function(x) {
    l <- x == 0
    max(cumsum(l) - cummax(cumsum(l) * !l))
}

df$N.Consec <- apply(cbind(df[, -1], df[, -1]), 1, maxcumzero)

df
#   location Winter Spring Summer Autumn N.Consec
# 1        1      0      0      0      3        3
# 2        2      0      2      2      0        2
# 3        3      3      4      7      4        0

      



This adds a column to the data frame that specifies the maximum number of times zero occurs sequentially in each row of the data frame. A data frame is a column bound to itself in order to be able to detect consecutive zeros between fall and winter.

The method used here is based on what Martin Morgan has in his answer to this similar question .

+2


source







All Articles