R: error in 2l.norm when there is only 1 NA

After specifying the "2l.norm" method when invoking the mice, I came across an error message for variables containing only 1 NA. I understand this is a very minor issue given the very small amount of missing data for these variables. However, it would be elegant to consider the data structure for them.

I recovered the situation using a publicly accessible database, the ChickWeight dataset. I am very aware that the problem could also be due to a bug in my procedure implementation, so please let me know if this is the case.

ChickWeight[1:20, ]
dim(ChickWeight)
sum(is.na(ChickWeight)) #contains no NAs
ChickWeight$weight[12] <- NA # add 1 NA
ChickWeight$constant <- 1 #add a constant
ChickWeight$Chick <- as.numeric(levels(ChickWeight$Chick)[ChickWeight$Chick]) #class variable has to be an integer

ini <- mice(ChickWeight, maxit = 0)
pred <- ini$predictorMatrix
pred["weight", ] <- c(0, 2, -2, 1, 2)
method <- ini$method
method["weight"] <- "2l.norm"
imputation <- mice(ChickWeight, m = 5, maxit = 5, pred = pred, method = method)

      

The last command results in:

Error in [<-.data.frame

( *tmp*

, i, value = c (37.3233463394145, 159.862324738397: replacement has 2 rows, data has 1

Adding one extra NA solves the problem

ChickWeight$weight[13] <- NA # add another NA
imputation <- mice(ChickWeight, m = 5, maxit = 5, pred = pred, method = method)

      

Does anyone know what might be causing the error?

+3


source to share


1 answer


The answer was kindly provided to me by email from Stef van Buuren, the author of the mouse pack. There is a very minor omission in the mice.impute.2l.norm function in the current version of the mouse package (2.22) and will be fixed in the next version.

The correct code is as follows: in the fourth from the last row, only "drop = FALSE" is added to rowSums (as.matrix (x [nry, type == 2):

mice.impute.2l.norm2 <- 
function (y, ry, x, type, intercept = TRUE, ...) 
{
  rwishart <- function(df, p = nrow(SqrtSigma), SqrtSigma = diag(p)) {
    Z <- matrix(0, p, p)
    diag(Z) <- sqrt(rchisq(p, df:(df - p + 1)))
    if (p > 1) {
      pseq <- 1:(p - 1)
      Z[rep(p * pseq, pseq) + unlist(lapply(pseq, seq))] <- rnorm(p * 
                                                                    (p - 1)/2)
    }
    crossprod(Z %*% SqrtSigma)
  }
  force.chol <- function(x, warn = TRUE) {
    z <- 0
    repeat {
      lambda <- 0.1 * z
      XT <- x + diag(x = lambda, nrow = nrow(x))
      XT <- (XT + t(XT))/2
      s <- try(expr = chol(XT), silent = TRUE)
      if (class(s) != "try-error") 
        break
      z <- z + 1
    }
    attr(s, "forced") <- (z > 0)
    if (warn && z > 0) 
      warning("Cholesky decomposition had to be forced", 
              call. = FALSE)
    return(s)
  }
  if (intercept) {
    x <- cbind(1, as.matrix(x))
    type <- c(2, type)
  }
  n.iter <- 100
  nry <- !ry
  n.class <- length(unique(x[, type == (-2)]))
  if (n.class == 0) 
    stop("No class variable")
  gf.full <- factor(x[, type == (-2)], labels = 1:n.class)
  gf <- gf.full[ry]
  XG <- split.data.frame(as.matrix(x[ry, type == 2]), gf)
  X.SS <- lapply(XG, crossprod)
  yg <- split(as.vector(y[ry]), gf)
  n.g <- tabulate(gf)
  n.rc <- ncol(XG[[1]])
  bees <- matrix(0, nrow = n.class, ncol = n.rc)
  ss <- vector(mode = "numeric", length = n.class)
  mu <- rep(0, n.rc)
  inv.psi <- diag(1, n.rc, n.rc)
  inv.sigma2 <- rep(1, n.class)
  sigma2.0 <- 1
  theta <- 1
  for (iter in 1:n.iter) {
    for (class in 1:n.class) {
      vv <- sym(inv.sigma2[class] * X.SS[[class]] + inv.psi)
      bees.var <- chol2inv(chol(vv))
      bees[class, ] <- drop(bees.var %*% (crossprod(inv.sigma2[class] * 
                                                      XG[[class]], yg[[class]]) + inv.psi %*% mu)) + 
        drop(rnorm(n = n.rc) %*% chol(sym(bees.var)))
      ss[class] <- crossprod(yg[[class]] - XG[[class]] %*% 
                               bees[class, ])
    }
    mu <- colMeans(bees) + drop(rnorm(n = n.rc) %*% chol(chol2inv(chol(sym(inv.psi)))/n.class))
    inv.psi <- rwishart(df = n.class - n.rc - 1, SqrtSigma = chol(chol2inv(chol(sym(crossprod(t(t(bees) - 
                                                                                                  mu)))))))
    inv.sigma2 <- rgamma(n.class, n.g/2 + 1/(2 * theta), 
                         scale = 2 * theta/(ss * theta + sigma2.0))
    H <- 1/mean(inv.sigma2)
    sigma2.0 <- rgamma(1, n.class/(2 * theta) + 1, scale = 2 * 
                         theta * H/n.class)
    G <- exp(mean(log(1/inv.sigma2)))
    theta <- 1/rgamma(1, n.class/2 - 1, scale = 2/(n.class * 
                                                     (sigma2.0/H - log(sigma2.0) + log(G) - 1)))
  }
  imps <- rnorm(n = sum(nry), sd = sqrt(1/inv.sigma2[gf.full[nry]])) + 
    rowSums(as.matrix(x[nry, type == 2, drop = FALSE]) * bees[gf.full[nry], 
                                                              ])
  return(imps)
}

      

When a new function is added to the mice namespace on



environment(mice.impute.2l.norm2) <- asNamespace('mice')

      

it can be used like normal in mice by calling 2l.norm2. To show in the previously described example:

method["weight"] <- "2l.norm2"
imputation <- mice(ChickWeight, m = 5, maxit = 5, pred = pred, method = method)

      

It now works as expected!

+2


source







All Articles