R: error in 2l.norm when there is only 1 NA
After specifying the "2l.norm" method when invoking the mice, I came across an error message for variables containing only 1 NA. I understand this is a very minor issue given the very small amount of missing data for these variables. However, it would be elegant to consider the data structure for them.
I recovered the situation using a publicly accessible database, the ChickWeight dataset. I am very aware that the problem could also be due to a bug in my procedure implementation, so please let me know if this is the case.
ChickWeight[1:20, ]
dim(ChickWeight)
sum(is.na(ChickWeight)) #contains no NAs
ChickWeight$weight[12] <- NA # add 1 NA
ChickWeight$constant <- 1 #add a constant
ChickWeight$Chick <- as.numeric(levels(ChickWeight$Chick)[ChickWeight$Chick]) #class variable has to be an integer
ini <- mice(ChickWeight, maxit = 0)
pred <- ini$predictorMatrix
pred["weight", ] <- c(0, 2, -2, 1, 2)
method <- ini$method
method["weight"] <- "2l.norm"
imputation <- mice(ChickWeight, m = 5, maxit = 5, pred = pred, method = method)
The last command results in:
Error in [<-.data.frame
( *tmp*
, i, value = c (37.3233463394145, 159.862324738397: replacement has 2 rows, data has 1
Adding one extra NA solves the problem
ChickWeight$weight[13] <- NA # add another NA
imputation <- mice(ChickWeight, m = 5, maxit = 5, pred = pred, method = method)
Does anyone know what might be causing the error?
source to share
The answer was kindly provided to me by email from Stef van Buuren, the author of the mouse pack. There is a very minor omission in the mice.impute.2l.norm function in the current version of the mouse package (2.22) and will be fixed in the next version.
The correct code is as follows: in the fourth from the last row, only "drop = FALSE" is added to rowSums (as.matrix (x [nry, type == 2):
mice.impute.2l.norm2 <-
function (y, ry, x, type, intercept = TRUE, ...)
{
rwishart <- function(df, p = nrow(SqrtSigma), SqrtSigma = diag(p)) {
Z <- matrix(0, p, p)
diag(Z) <- sqrt(rchisq(p, df:(df - p + 1)))
if (p > 1) {
pseq <- 1:(p - 1)
Z[rep(p * pseq, pseq) + unlist(lapply(pseq, seq))] <- rnorm(p *
(p - 1)/2)
}
crossprod(Z %*% SqrtSigma)
}
force.chol <- function(x, warn = TRUE) {
z <- 0
repeat {
lambda <- 0.1 * z
XT <- x + diag(x = lambda, nrow = nrow(x))
XT <- (XT + t(XT))/2
s <- try(expr = chol(XT), silent = TRUE)
if (class(s) != "try-error")
break
z <- z + 1
}
attr(s, "forced") <- (z > 0)
if (warn && z > 0)
warning("Cholesky decomposition had to be forced",
call. = FALSE)
return(s)
}
if (intercept) {
x <- cbind(1, as.matrix(x))
type <- c(2, type)
}
n.iter <- 100
nry <- !ry
n.class <- length(unique(x[, type == (-2)]))
if (n.class == 0)
stop("No class variable")
gf.full <- factor(x[, type == (-2)], labels = 1:n.class)
gf <- gf.full[ry]
XG <- split.data.frame(as.matrix(x[ry, type == 2]), gf)
X.SS <- lapply(XG, crossprod)
yg <- split(as.vector(y[ry]), gf)
n.g <- tabulate(gf)
n.rc <- ncol(XG[[1]])
bees <- matrix(0, nrow = n.class, ncol = n.rc)
ss <- vector(mode = "numeric", length = n.class)
mu <- rep(0, n.rc)
inv.psi <- diag(1, n.rc, n.rc)
inv.sigma2 <- rep(1, n.class)
sigma2.0 <- 1
theta <- 1
for (iter in 1:n.iter) {
for (class in 1:n.class) {
vv <- sym(inv.sigma2[class] * X.SS[[class]] + inv.psi)
bees.var <- chol2inv(chol(vv))
bees[class, ] <- drop(bees.var %*% (crossprod(inv.sigma2[class] *
XG[[class]], yg[[class]]) + inv.psi %*% mu)) +
drop(rnorm(n = n.rc) %*% chol(sym(bees.var)))
ss[class] <- crossprod(yg[[class]] - XG[[class]] %*%
bees[class, ])
}
mu <- colMeans(bees) + drop(rnorm(n = n.rc) %*% chol(chol2inv(chol(sym(inv.psi)))/n.class))
inv.psi <- rwishart(df = n.class - n.rc - 1, SqrtSigma = chol(chol2inv(chol(sym(crossprod(t(t(bees) -
mu)))))))
inv.sigma2 <- rgamma(n.class, n.g/2 + 1/(2 * theta),
scale = 2 * theta/(ss * theta + sigma2.0))
H <- 1/mean(inv.sigma2)
sigma2.0 <- rgamma(1, n.class/(2 * theta) + 1, scale = 2 *
theta * H/n.class)
G <- exp(mean(log(1/inv.sigma2)))
theta <- 1/rgamma(1, n.class/2 - 1, scale = 2/(n.class *
(sigma2.0/H - log(sigma2.0) + log(G) - 1)))
}
imps <- rnorm(n = sum(nry), sd = sqrt(1/inv.sigma2[gf.full[nry]])) +
rowSums(as.matrix(x[nry, type == 2, drop = FALSE]) * bees[gf.full[nry],
])
return(imps)
}
When a new function is added to the mice namespace on
environment(mice.impute.2l.norm2) <- asNamespace('mice')
it can be used like normal in mice by calling 2l.norm2. To show in the previously described example:
method["weight"] <- "2l.norm2"
imputation <- mice(ChickWeight, m = 5, maxit = 5, pred = pred, method = method)
It now works as expected!
source to share