Remove columns with standard deviation of zero
I want to remove all columns with standard deviation of zero from data.frame.
This does not work:
df <- df[, ! apply(df , 2 , function(x) sd(x)==0 ) ]
I get an error:
undefined selected columns
UPDATE
I chose Filter
as my preferred answer as it also handles NA
s which is very helpful.
For example, in
df <- data.frame(v1=c(0,0,NA,0,0), v2=1:5)
column "v1" is removed with Filter
, while methods apply
generate errors.
Through all the other solutions, I learned a lot from them.
UPDATE2:
Application-specific errors can be fixed by appending na.rm = TRUE
to the sd call as follows:
df[, ! apply(df , 2 , function(x) sd(x, na.rm = TRUE)==0 ) ]
source to share
In addition to @ grrgrrbla's and @akrun's answers with help Filter
, here's the correct way to do what you originally had in mind:
df <- df[, !sapply(df, function(x) { sd(x) == 0} )]
Or
df <- df[, sapply(df, function(x) { sd(x) != 0} )]
I used sapply()
to get a vector TRUE
when the dataframe column has a standard deviation of 0 and FALSE
otherwise. Then I multiply the original dataframe with this vector.
source to share
You can just use it Filter
without anonymous function call, since the "SD" of "0" is forced to "FALSE" and everything else is "TRUE" until it Filter
only prints columns that are TRUE
orsd!=0
Filter(sd, df)
Or, if there are mixed class columns, it length(unique)
might be more general.
df[vapply(df, function(x) length(unique(na.omit(x)))>1, logical(1L))]
Or we can use tidyverse
library(tidyverse)
library(magrittr)
df %>%
map_lgl(~sd(.) !=0) %>%
extract(df, .)
data
df <- structure(list(V1 = c(1, 4, 2, 5), V2 = c(2, 2, 2, 2), V3 = c(3,
4, 3, 3), V4 = c(1, 2, 3, 3)), .Names = c("V1", "V2", "V3", "V4"
), row.names = c(NA, -4L), class = "data.frame")
source to share