Why is the first bar so big in my histogram?
I am playing with R. I am trying to represent the distribution of 1000 dice rolls with the following R script:
cases <- 1000
min <- 1
max <- 6
x <- as.integer(runif(cases,min,max+1))
mx <- mean(x)
sd <- sd(x)
hist(
x,
xlim=c(min - abs(mx/2),max + abs(mx/2)),
main=paste(cases,"Samples"),
freq = FALSE,
breaks=seq(min,max,1)
)
curve(dnorm(x, mx, sd), add = TRUE, col="blue", lwd = 2)
abline(v = mx, col = "red", lwd = 2)
legend("bottomleft",
legend=c(paste('Mean (', mx, ')')),
col=c('red'), lwd=2, lty=c(1))
The script outputs the following histogram:
Can someone explain to me why the first bar is so big? I checked the data and everything looks fine. How can I fix this?
Thank you in advance!
source to share
Histograms are not suitable for discrete data, they are intended for continuous data. Your data looks something like this:
> table(x)
x
1 2 3 4 5 6
174 138 162 178 196 152
i.e. roughly equal amount of each value. But when you put that on the histogram, you chose breakpoints at 1: 6. The first bar has 174 entries at its left limit and 138 at its right limit, so it displays 312.
You can get a nicer histogram by specifying breaks to full integers, i.e. breaks = 0:6 + 0.5
but it still doesn't make sense to use a histogram for such data. Simpler execution plot(table(x))
or barplot(table(x))
gives a more accurate display of data.
source to share
m0nhawk gets into trouble. Another issue might be usage as.integer
, which is always rounded (and therefore kinked to 1
).
as.integer(1.7)
# 1
round(1.7)
# 2
Finally, I'm not sure why one of them would fit a Gaussian uniform distribution. Making numbers out of rnorm
, rather than runif
, would make more sense.
source to share