Visualize the difference from the target value using histograms
I am trying to visualize the offset of a time series from its base value using histograms from the R package gglot2. For example, take the following synthetic data:
baseline = 400
steps <- sample(0:10,50,replace=TRUE) - sample(0:10,50,replace=TRUE)
value <- cumsum(steps) + baseline
time = 1:50
data <- data.frame(time,value)
print(value)
[1] 400 400 397 397 393 400 394 395 389 389 385 395 400 399 405 403 399 401 399 401
[21] 401 401 398 397 395 395 401 402 393 400 399 398 406 412 417 413 410 401 400 399
[41] 394 401 406 406 401 404 411 413 404 402
I can draw the diagram at its original scale, but this is not very informative:
longdata <- ddply( data, "value", transform, posneg=sign(value-baseline) )
longdata[longdata$posneg == 0,'posneg'] <- 1
p_aes <- aes( time, value, fill=factor(posneg))
p_scale <- scale_fill_brewer( palette='Set1', guide=FALSE )
p_geom <- geom_bar( stat='identity', position='identity' )
ggplot(longdata) + p_aes + p_scale + p_geom
By moving the aesthetics along the y-axis (i.e. y = value - baseline) I end up with a chart that I want to show, which is nice and easy.
longdata <- ddply( data, "value", transform, posneg=sign(value-baseline) )
longdata[longdata$posneg == 0,'posneg'] <- 1
p_aes <- aes( time, value-baseline, fill=factor(posneg))
p_scale <- scale_fill_brewer( palette='Set1', guide=FALSE )
p_geom <- geom_bar( stat='identity', position='identity' )
ggplot(longdata) + p_aes + p_scale + p_geom
Unfortunately, the y-axis is now scaled to an offset from the baseline, that is, to "value is baseline". However, I want the y-axis to keep the original values (between 380 and 420).
Is there a way to keep the original y-axis scale for the second plot? Do you have any other guidelines for visualizing differences from the target value?
source to share
Add a function:
yaxis_format <- function(x){
lab <- 400-x
}
and then use scale_y_continuous(label = yaxis_format)
to handle the label:
ggplot(longdata) + p_aes + p_scale + p_geom + scale_y_continuous(label=yaxis_format)
The final code and graph should look like this:
library(ggplot2)
library(plyr)
set.seed(201)
baseline = 400
steps <- sample(0:10,50,replace=TRUE) - sample(0:10,50,replace=TRUE)
value <- cumsum(steps) + baseline
time = 1:50
data <- data.frame(time,value)
yaxis_format <- function(x){
lab <- 400-x
}
longdata <- ddply( data, "value", transform, posneg=sign(value-baseline) )
longdata[longdata$posneg == 0,'posneg'] <- 1
p_aes <- aes( time, value-baseline, fill=factor(posneg))
p_scale <- scale_fill_brewer( palette='Set1', guide=FALSE )
p_geom <- geom_bar( stat='identity', position='identity' )
ggplot(longdata) + p_aes + p_scale + p_geom + scale_y_continuous(label=yaxis_format)
+ ylab("Value")
Now that this is all set, notice that the scale is odd. Use scale_y_reverse
instead to fix it:
ggplot(longdata) + p_aes + p_scale + p_geom + scale_y_reverse(label=yaxis_format)
+ ylab("Value")
source to share
Another solution instead of post-hoc-changing the y-axis is to use geom_linerange
and then just make the lines wide enough for a particular chart (different geometries for cross or error bars may also be suitable).
p <- ggplot(data=longdata, aes(x = time, color = factor(posneg))) +
geom_linerange(aes(ymax = value, ymin = baseline), size = 3) +
scale_color_brewer( palette='Set1', guide=FALSE )
p
This is a perfectly reasonable plot as you show that you skip all options by scaling and using a baseline at rule zero. But there are such strong conventions in histograms that the baseline must be at zero, it can be misinterpreted. Also, bars with a base value are not displayed at all on the chart, which makes them look like missing data.
A line chart with a horizontal bar symbolizing the baseline is sufficient to display the same information; no color is needed.
p2 <- ggplot(data=longdata, aes(x = time, y = value)) +
geom_line() + geom_point() + geom_hline(yintercept=baseline)
p2
source to share