Visualize the difference from the target value using histograms

I am trying to visualize the offset of a time series from its base value using histograms from the R package gglot2. For example, take the following synthetic data:

baseline = 400
steps <- sample(0:10,50,replace=TRUE) - sample(0:10,50,replace=TRUE)
value <- cumsum(steps) + baseline
time = 1:50
data <- data.frame(time,value)

print(value)
 [1] 400 400 397 397 393 400 394 395 389 389 385 395 400 399 405 403 399 401 399 401
[21] 401 401 398 397 395 395 401 402 393 400 399 398 406 412 417 413 410 401 400 399
[41] 394 401 406 406 401 404 411 413 404 402

      

I can draw the diagram at its original scale, but this is not very informative:

longdata <- ddply( data, "value", transform, posneg=sign(value-baseline) )
longdata[longdata$posneg == 0,'posneg'] <- 1
p_aes <- aes( time, value, fill=factor(posneg))
p_scale <- scale_fill_brewer( palette='Set1', guide=FALSE )
p_geom <- geom_bar( stat='identity', position='identity' )
ggplot(longdata) + p_aes + p_scale + p_geom

      

enter image description here

By moving the aesthetics along the y-axis (i.e. y = value - baseline) I end up with a chart that I want to show, which is nice and easy.

longdata <- ddply( data, "value", transform, posneg=sign(value-baseline) )
longdata[longdata$posneg == 0,'posneg'] <- 1
p_aes <- aes( time, value-baseline, fill=factor(posneg))
p_scale <- scale_fill_brewer( palette='Set1', guide=FALSE )
p_geom <- geom_bar( stat='identity', position='identity' )
ggplot(longdata) + p_aes + p_scale + p_geom

      

enter image description here

Unfortunately, the y-axis is now scaled to an offset from the baseline, that is, to "value is baseline". However, I want the y-axis to keep the original values ​​(between 380 and 420).

Is there a way to keep the original y-axis scale for the second plot? Do you have any other guidelines for visualizing differences from the target value?

+3


source to share


2 answers


Add a function:

yaxis_format <- function(x){
lab <- 400-x
}

      

and then use scale_y_continuous(label = yaxis_format)

to handle the label:

ggplot(longdata) + p_aes + p_scale + p_geom + scale_y_continuous(label=yaxis_format)

      

The final code and graph should look like this:



library(ggplot2)
library(plyr)

set.seed(201)

baseline = 400
steps <- sample(0:10,50,replace=TRUE) - sample(0:10,50,replace=TRUE)
value <- cumsum(steps) + baseline
time = 1:50
data <- data.frame(time,value)

yaxis_format <- function(x){
lab <- 400-x
}

longdata <- ddply( data, "value", transform, posneg=sign(value-baseline) )
longdata[longdata$posneg == 0,'posneg'] <- 1
p_aes <- aes( time, value-baseline, fill=factor(posneg))
p_scale <- scale_fill_brewer( palette='Set1', guide=FALSE )
p_geom <- geom_bar( stat='identity', position='identity' )
ggplot(longdata) + p_aes + p_scale + p_geom + scale_y_continuous(label=yaxis_format)
                 + ylab("Value")

      

enter image description here

Now that this is all set, notice that the scale is odd. Use scale_y_reverse

instead to fix it:

ggplot(longdata) + p_aes + p_scale + p_geom + scale_y_reverse(label=yaxis_format)
                 + ylab("Value")

      

enter image description here

+5


source


Another solution instead of post-hoc-changing the y-axis is to use geom_linerange

and then just make the lines wide enough for a particular chart (different geometries for cross or error bars may also be suitable).

p <- ggplot(data=longdata, aes(x = time, color = factor(posneg))) +
     geom_linerange(aes(ymax = value, ymin = baseline), size = 3) +
     scale_color_brewer( palette='Set1', guide=FALSE )

p

      

enter image description here

This is a perfectly reasonable plot as you show that you skip all options by scaling and using a baseline at rule zero. But there are such strong conventions in histograms that the baseline must be at zero, it can be misinterpreted. Also, bars with a base value are not displayed at all on the chart, which makes them look like missing data.



A line chart with a horizontal bar symbolizing the baseline is sufficient to display the same information; no color is needed.

p2 <- ggplot(data=longdata, aes(x = time, y = value)) + 
      geom_line() + geom_point() + geom_hline(yintercept=baseline)

p2

      

enter image description here

+4


source







All Articles