Inverse Elimination in Logistic Regression Using R
I am doing logistic regression in R and doing reverse elimination to get my final model:
FulMod2 <- glm(surv~as.factor(tdate)+as.factor(tdate)+as.factor(sline)+as.factor(pgf)
+as.factor(weight5)+as.factor(backfat5)+as.factor(srect2)
+as.factor(bcs)+as.factor(loco3)+as.factor(fear3)
+as.factor(teats)+as.factor(preudder)+as.factor(postudder)
+as.factor(colos)+as.factor(tb5) +as.factor(respon3)
+as.factor(feed5)+as.factor(bwt5)+as.factor(sex)
+as.factor(fos2)+as.factor(gest3)+as.factor(int3),
family=binomial(link="logit"),data=sof)
When trying to reverse the undo script:
step(FulMod2,direction="backward",trace=FALSE)
I got this error message:
Error in step(FulMod2, direction = "backward", trace = FALSE) :
number of rows in use has changed: remove missing values?
This is the second model I run using the inverse function. The first model was fine when I did the reverse undo to get my final model.
Any help would be much appreciated!
Baz
source to share
In order to successfully execute step()
in your model for the reverse selection, you must remove cases sof
with missing data in the variables being tested.
myForm <- as.formula(surv~
as.factor(tdate)+as.factor(tdate)+as.factor(sline)+as.factor(pgf)
+as.factor(weight5)+as.factor(backfat5)+as.factor(srect2)
+as.factor(bcs)+as.factor(loco3)+as.factor(fear3)
+as.factor(teats)+as.factor(preudder)+as.factor(postudder)
+as.factor(colos)+as.factor(tb5) +as.factor(respon3)
+as.factor(feed5)+as.factor(bwt5)+as.factor(sex)
+as.factor(fos2)+as.factor(gest3)+as.factor(int3))
sofNoMis <- sof[which(complete.cases(sof[,all.vars(myForm)])),]
FulMod2 <- glm(myForm,family=binomial(link="logit"),data=sofNoMis)
step(FulMod2,direction="backward",trace=FALSE)
In your comment, you mentioned that 1 in 9 cases are missing data. However, I recommend testing this again with the above code, in case some of this flaw is relevant to variables not included in FulMod2
. If you still have many incomplete cases, you can decide a priori if you can eliminate some of the high missing variables.
source to share