Generate data from truncated normal distribution with exact mean and sd in R
I am struggling with the following challenge: I need to generate data from a truncated normal distribution. The mean and standard deviation of the sample must match exactly those reported in the population. This is what I have so far:
mean <- 100
sd <- 5
lower <- 40
upper <- 120
n <- 100
library(msm)
data <- as.numeric(mean+sd*scale(rtnorm(n, lower=40, upper=120)))
The generated sample takes exactly the mean and sd given together. However, some values exceed the intended boundaries. Any idea how to fix this? I thought about just turning off all values outside of those bounds, but then the averages and sd no longer resemble population.
source to share
You can use an iterative answer. Here I add samples one by one to the vector, but only if the resulting scaled dataset remains within the bounds you set. It takes more time but it works:
n <- 10000
mean <- 100
sd <- 15
lower <- 40
upper <- 120
data <- rtnorm(1, lower=((lower - mean)/sd), upper=((upper - mean)/sd))
while (length(data) < n) {
sample <- rtnorm(1, lower=((lower - mean)/sd), upper=((upper - mean)/sd))
data_copy = c(data, sample)
data_copy_scaled = mean + sd * scale(data_copy)
if (min(data_copy_scaled) >= lower & max(data_copy_scaled) <= upper) {
data = c(data, sample)
}
}
scaled_data = as.numeric(mean + sd * scale(data))
summary(scaled_data)
Min. 1st Qu. Median Mean 3rd Qu. Max.
40.38 91.61 104.35 100.00 111.28 120.00
sd(scaled_data)
15
Below is my old answer which doesn't quite work
How about scaling the lower and upper limits rtnorm
with the mean and sd you want?
n <- 1000000
mean <- 100
sd <- 5
library(msm)
data <- as.numeric(mean+sd*scale(rtnorm(n, lower=((40 - mean)/sd), upper=((120 - mean)/sd))))
summary(data)
Min. 1st Qu. Median Mean 3rd Qu. Max.
76.91 96.63 100.00 100.00 103.37 120.00
sd(data)
5
In this case, even with a sample of 1,000,000, you get an accurate mean and sd, and the max and min values stay within your bounds.
source to share