How to find and remove columns containing more than k consecutive zeros in an R data.frame?

I have a huge one data.frame

with about 200 variables, each represented by a column. Unfortunately, the data comes from a poorly formatted data dump (and therefore cannot be changed) that represents both missing values ​​and zeros like 0

. The data was observed every 5 minutes for a month, and the day period 0

can only reasonably be considered a day when the meter was not functioning, leading to the conclusion that those are 0

in fact NA

with.

I want to find (and delete) columns that are at least 288 consecutive 0

at any point. Or more generally, how to remove columns from data.frame

containing> = k consecutive 0

s?

I am relatively new to R and any help would be greatly appreciated. Thank!

EDIT: Here's an example of reproducibility. Given k = 4, I would like to delete columns A and B (but not C as they are 0

not sequential).

df<-data.frame(A=c(4,5,8,2,0,0,0,0,6,3), B=c(3,0,0,0,0,6,8,2,1,0), C=c(4,5,6,0,3,0,2,1,0,0), D=c(1:10))
df
   A B C D
1  4 3 4  1
2  5 0 5  2
3  8 0 6  3
4  2 0 0  4
5  0 0 3  5
6  0 6 0  6
7  0 8 2  7
8  0 2 1  8
9  6 1 0  9
10 3 0 0 10

      

+3
r dataframe data-cleaning


source to share


1 answer


You can use this function for your data:

cons.Zeros <- function (x, n)
{
    x <- x[!is.na(x)] == 0
    r <- rle(x)
    any(r$lengths[r$values] >= n)
}

      

This function returns TRUE

for the columns to be dropped. n

is the number of consecutive zeros that require the column to be dropped.



For your sample dataset, use n = 3

;

df.dropped <- df[, !sapply(df, cons.Zeros, n=3)]

#output:
# > df.dropped 

#    C  D 
# 1  4  1 
# 2  5  2 
# 3  6  3 
# 4  0  4 
# 5  3  5 
# 6  0  6 
# 7  2  7 
# 8  1  8 
# 9  0  9 
# 10 0 10

      

+1


source to share







All Articles
Loading...
X
Show
Funny
Dev
Pics