Dplyr summaryise standard error function

I can summarize my data and calculate the mean and sd values ​​using:

summary <- aspen %>% group_by(year,Spp,CO2) %>% summarise_each(funs(mean,sd))

      

However, I cannot calculate the standard error either. I tried this without success:

summary <- aspen %>% group_by(year,Spp,CO2) %>% summarise_each(funs(mean,sd,se=sd/sqrt(n())))

      

+3


source to share


3 answers


You can use a function std.error

from a package, plotrix

or define your own function first and pass that function name as an argument.



    library(plotrix)
    summary <- aspen %>% group_by(year,Spp,CO2) %>% 
summarise_each(funs(mean,sd,std.error)))

      

+2


source


You could do

library(dplyr)
aspen %>% 
    group_by(year,Spp,CO2) %>%
    summarise_each(funs(mean,sd,se=sd(.)/sqrt(n())))

      

For reproducibility,



data(mtcars)
grpMt <- mtcars %>%
          group_by(gear, carb)

grpMt %>%
     summarise_each(funs(mean, sd, se=sd(.)/sqrt(n())), hp:drat) %>% 
     slice(1:2)
#   gear carb hp_mean drat_mean     hp_sd   drat_sd     hp_se    drat_se
#1    3    1   104.0    3.1800  6.557439 0.4779121  3.785939 0.27592269
#2    3    2   162.5    3.0350 14.433757 0.1862794  7.216878 0.09313968
#3    4    1    72.5    4.0575 13.674794 0.1532699  6.837397 0.07663496
#4    4    2    79.5    4.1625 26.913441 0.5397144 13.456721 0.26985722
#5    5    2   102.0    4.1000 15.556349 0.4666905 11.000000 0.33000000
#6    5    4   264.0    4.2200        NA        NA        NA         NA

      

which is the same as std.error

fromplotrix

 library(plotrix)
 grpMt %>% 
    summarise_each(funs(mean, sd, se=std.error), hp:drat) %>% 
    slice(1:2)
 #  gear carb hp_mean drat_mean     hp_sd   drat_sd     hp_se    drat_se
 #1    3    1   104.0    3.1800  6.557439 0.4779121  3.785939 0.27592269
 #2    3    2   162.5    3.0350 14.433757 0.1862794  7.216878 0.09313968
 #3    4    1    72.5    4.0575 13.674794 0.1532699  6.837397 0.07663496
 #4    4    2    79.5    4.1625 26.913441 0.5397144 13.456721 0.26985722
 #5    5    2   102.0    4.1000 15.556349 0.4666905 11.000000 0.33000000
 #6    5    4   264.0    4.2200        NA        NA        NA         NA

      

+10


source


An important addition to @akrun:

If missing values ​​( NA

) are missing , you should use:

summarise_each(funs(mean(., na.rm=T), n = sum(!is.na(.)), se = sd(., na.rm=T)/sqrt(sum(!is.na(.)))), hp:drat)

Unfortunately, the function n()

does not remove missing values, so in addition to using, na.rm=T

we need to replace n()

with sum(!is.na(.))

.

An illustration on how it can be wrong with some of my own data:

summarise_each(funs( mean(., na.rm=T), n1=n(), n2=sum(!is.na(.)), se1=sd(., na.rm=T)/sqrt(n()), se2=sd(., na.rm=T)/sqrt(sum(!is.na(.)))), rating)

dplyr n () includes NAs

n2

and se2

- correct values.

0


source







All Articles