R: window function

I have a data frame DF

, with three columns and n rows shown below:

Month Year  Default
1   2015    T
2   2015    T
3   2015    F
4   2015    T
5   2015    T
6   2015    T
7   2015    F

      

I would like to check if there is 3t on a roll and continue and then print the entire starting year and month in a new DF.

I need to get the result as shown above. The output should look like this:

Month   Year
4   2015

      

+3


source to share


3 answers


Here's an attempt using data.table

devel version on GH and a new featurerleid

library(data.table) # v 1.9.5+
setDT(df)[, indx := rleid(Default)]
df[(Default), if(.N > 2) .SD[1L], by = indx]
#    indx Month Year Default
# 1:    3     4 2015    TRUE

      

Basically we do this to set a unique index for each event in Default

, and then, looking only when Default == TRUE

we cheat on each group, if the group size is greater than 2, if so, select the first instance in that group.




A shorter version (suggested by @Arun) would be

setDT(df)[, if(Default && .N > 2L) .SD[1L], by = .(indx = rleid(Default), Default)]

      

+2


source


This may not be the best solution, but my first attempt is to insert the third column into a row - use regexpr to find all occurrences of "TTT" in that row, which will give you a vector. - use this vector to subset the original data row by row, omitting the last column

EDIT



Now with the code:

def_str <- paste(as.integer(DF$default), collapse="")
indices <- unlist(gregexp("111+", def_str))
if (!indices[1]==-1){
  # if there is no match, indices will be -1
  DF[indices,-3]
}
else {
  print("someting dramatic about no 3 months rolling T's")
}

      

+1


source


A way to do it with rle in R base without data.table, although data.table is a very sweet package! But sometimes people just want to use the R base without other dependencies.

dt <- data.frame(Month = c(1, 2, 3, 4, 5, 6, 7), Year = 2015, Default = c(T, T, F, T, T, T, F))

runData <- rle(dt$Default)

whichThree <- which(runData$lengths == 3 & runData$values)

idx <- unlist(lapply(whichThree - 1, function(x) sum(runData$lengths[1:x])))
idx <- idx + 1

dt[idx, 1:2]

      

+1


source







All Articles