Scale_y_log10 () and coord_trans (ytrans = 'log10') lead to different results
I am using log transforms for my statistical analyzes (reaction time) and now I want to plot my data using the log y-axis. When I use coord_trans (ytrans = "log10") that gives me correct results, but I need bars instead of points for my chart. When I use scale_y_log10 () it works with bars, but it calculates the wrong values โโ(bar1 has an average of 833 but displays above 900, bar2 has an average of 568 but shows closer to 500).
set.seed(10)
bar1 <- abs(rnorm(n = 232, mean = 833, sd = 1103)) + 1
bar2 <- abs(rnorm(n = 393, mean = 568, sd = 418)) + 1
graph_data <- data.frame(RT = c(bar1, bar2), group = c(rep(1, 232), rep(2, 393)))
ggplot(graph_data, aes(group, RT)) +
stat_summary(fun.y = mean, geom = 'point', position = 'dodge') +
stat_summary(fun.data = mean_cl_normal, geom = 'pointrange', position = 'position_dodge'(width = .9)) +
coord_trans(ytrans = "log10")
ggplot(graph_data, aes(group, RT)) +
stat_summary(fun.y = mean, geom = 'bar', position = 'dodge') +
stat_summary(fun.data = mean_cl_normal, geom = 'pointrange', position = 'position_dodge'(width = .9)) +
scale_y_log10(breaks = seq(300, 1000, 100))
Thanks for the help!
source to share
There are two reasons why you got different values.
First, if you look at the help page coord_trans()
, you will see that:
coord_trans differs from scale transformations in that what happens after a statistical transformation will affect the appearance of geometries - there is no guarantee that straight lines will continue to be straight.
This means that when coord_trans()
only the coordinates (y-axis) affect log10, but with scale_y_log10()
your actual data is converted to the log before other calculations.
Second, your data has negative values, and when you apply scale_y_log10()
to your data, those values โโare removed and all calculations are done on only a portion of your data, so the average you get is greater than coord_trans()
.
Warning messages:
1: In scale$trans$trans(x) : NaNs produced
2: In scale$trans$trans(x) : NaNs produced
3: Removed 100 rows containing missing values (stat_summary).
4: Removed 100 rows containing missing values (stat_summary).
source to share