Dplyr summaryise standard error function
I can summarize my data and calculate the mean and sd values ββusing:
summary <- aspen %>% group_by(year,Spp,CO2) %>% summarise_each(funs(mean,sd))
However, I cannot calculate the standard error either. I tried this without success:
summary <- aspen %>% group_by(year,Spp,CO2) %>% summarise_each(funs(mean,sd,se=sd/sqrt(n())))
source to share
You could do
library(dplyr)
aspen %>%
group_by(year,Spp,CO2) %>%
summarise_each(funs(mean,sd,se=sd(.)/sqrt(n())))
For reproducibility,
data(mtcars)
grpMt <- mtcars %>%
group_by(gear, carb)
grpMt %>%
summarise_each(funs(mean, sd, se=sd(.)/sqrt(n())), hp:drat) %>%
slice(1:2)
# gear carb hp_mean drat_mean hp_sd drat_sd hp_se drat_se
#1 3 1 104.0 3.1800 6.557439 0.4779121 3.785939 0.27592269
#2 3 2 162.5 3.0350 14.433757 0.1862794 7.216878 0.09313968
#3 4 1 72.5 4.0575 13.674794 0.1532699 6.837397 0.07663496
#4 4 2 79.5 4.1625 26.913441 0.5397144 13.456721 0.26985722
#5 5 2 102.0 4.1000 15.556349 0.4666905 11.000000 0.33000000
#6 5 4 264.0 4.2200 NA NA NA NA
which is the same as std.error
fromplotrix
library(plotrix)
grpMt %>%
summarise_each(funs(mean, sd, se=std.error), hp:drat) %>%
slice(1:2)
# gear carb hp_mean drat_mean hp_sd drat_sd hp_se drat_se
#1 3 1 104.0 3.1800 6.557439 0.4779121 3.785939 0.27592269
#2 3 2 162.5 3.0350 14.433757 0.1862794 7.216878 0.09313968
#3 4 1 72.5 4.0575 13.674794 0.1532699 6.837397 0.07663496
#4 4 2 79.5 4.1625 26.913441 0.5397144 13.456721 0.26985722
#5 5 2 102.0 4.1000 15.556349 0.4666905 11.000000 0.33000000
#6 5 4 264.0 4.2200 NA NA NA NA
source to share
An important addition to @akrun:
If missing values ββ( NA
) are missing , you should use:
summarise_each(funs(mean(., na.rm=T), n = sum(!is.na(.)), se = sd(., na.rm=T)/sqrt(sum(!is.na(.)))), hp:drat)
Unfortunately, the function n()
does not remove missing values, so in addition to using, na.rm=T
we need to replace n()
with sum(!is.na(.))
.
An illustration on how it can be wrong with some of my own data:
summarise_each(funs(
mean(., na.rm=T), n1=n(), n2=sum(!is.na(.)),
se1=sd(., na.rm=T)/sqrt(n()), se2=sd(., na.rm=T)/sqrt(sum(!is.na(.)))), rating)
n2
and se2
- correct values.
source to share