Retrieving Columns for Non-Null Data

Can R get colMeans for non-null dataframe values?

data<-data.frame(col1=c(1,0,1,0,3,3),col2=c(5,0,5,0,7,7))
colMeans(data)   # 1.33,4

      

I would like something like:

mean(data$col1[data$col1>0]) # 2
mean(data$col2[data$col2>0]) # 6

      

Thanks in advance: D

+3


source to share


2 answers


You can change 0

to NA

and then use colMeans

as it has an option for na.rm=TRUE

. In a two-step process, we convert the data items from "0" to "NA" and then get colMeans

it by excluding the items NA

.

  is.na(data) <- data==0
  colMeans(data, na.rm=TRUE) 
  #   col1 col2 
  #    2    6 

      

If you need it in one step, we can change the boolean matrix ( data==0

) to NA

and 1 by doing ( NA^

) for the values ​​corresponding to "0" and non-zero elements, and then multiply by the original data so that 1 value changes to the element at that position and NA remained so. We can do colMeans

on this output as above.

   colMeans(NA^(data==0)*data, na.rm=TRUE)
   #  col1 col2 
   #   2    6 

      



Another option is to use sapply/vapply

. If the dataset is really large, converting to matrix

may not be a good idea as it can cause memory problems. By scrolling through the columns with sapply

or more specific vapply

(will be a little faster) we get mean

non-zero items.

 vapply(data, function(x) mean(x[x!=0]), numeric(1))
 #  col1 col2 
 #  2    6 

      

Or we can use summarise_each

and specify the function internally funs

after the subset of non-null elements.

 library(dplyr)
 summarise_each(data, funs(mean(.[.!=0])))
 #  col1 col2
 #1    2    6

      

+5


source


You can use colSums

both data and "boolean" to divide the sums of the columns by the number of nonzero elements for each column:



colSums(data)/colSums(!!data)
col1 col2 
   2    6 

      

+7


source







All Articles