How can I remove columns containing NA or variance equal to 0

Question

How can I remove columns containing NA or variance equal to 0

I want scale

my data before doing PCA, but unfortunately I found that some columns contain NA and some columns have variance 0, I want to delete those columns. This is an example of my data.

df <- data.frame( v1 = 1:10 , v2 = rep( 0 , 10 ) , v3 = sample( c( 1:3 , NA ) , 10 , repl = TRUE ), v4 = 1:10 )

I want to delete columns v2

and v3

at the same time. how can i implement this?

I know how to delete columns containing NA

and then delete the column whose variance is 0.

colsd <- apply(df, 2, sd)
df2 <- df[!is.na(colsd)]
colsd2 <- apply(df2, 2, sd)
df3 <- df2[!colsd2 == 0]

but it looks redundant, I just want to know if I can implement this more efficiently, perhaps in just one line. Thanks for any answer.

+3

r dataframe

Zihu guo May 29 '15 @ 1:15 am

source to share

1 answer

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2015-05-29T01:20:07+0000

You can try something like:

> df[!sapply(df, var) %in% c(0, NA)]
   v1 v4
1   1  1
2   2  2
3   3  3
4   4  4
5   5  5
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

How can I remove columns containing NA or variance equal to 0

More articles: