How can I remove columns containing NA or variance equal to 0

I want scale

my data before doing PCA, but unfortunately I found that some columns contain NA and some columns have variance 0, I want to delete those columns. This is an example of my data.

df <- data.frame( v1 = 1:10 , v2 = rep( 0 , 10 ) , v3 = sample( c( 1:3 , NA ) , 10 , repl = TRUE ), v4 = 1:10 )

      

I want to delete columns v2

and v3

at the same time. how can i implement this?

I know how to delete columns containing NA

and then delete the column whose variance is 0.

colsd <- apply(df, 2, sd)
df2 <- df[!is.na(colsd)]
colsd2 <- apply(df2, 2, sd)
df3 <- df2[!colsd2 == 0]

      

but it looks redundant, I just want to know if I can implement this more efficiently, perhaps in just one line. Thanks for any answer.

+3


source to share


1 answer


You can try something like:



> df[!sapply(df, var) %in% c(0, NA)]
   v1 v4
1   1  1
2   2  2
3   3  3
4   4  4
5   5  5
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

      

+5


source







All Articles