Plural calculus on new / predictive data

Can someone please help me understand how to handle missing values ​​in new / invisible data? I've researched several attachment packages in R and all seem to only attribute the set for training and testing (at the same time). How do you process new unlabeled data for evaluation in the same way as you train / test? Basically, I want to use multiple imputation for missing values ​​in the training / test set, and the same model / method for the prediction data. Based on my research on multiple imputation (not expert), isn't it possible to do this with MI? However, for example with a caret function, you can easily use the same model that was used to train / test the new data. Any help would be greatly appreciated. Thank.

** Edit

Basically, my dataset contains a lot of missing values. Deleting is not an option as it will drop most of my train / test suite. Up to this point, I have coded categorical variables, removed nearly zero variance and high correlated variables. After this preprocessing, I was able to easily apply the imputation mouse pack

m=mice(sg.enc)

      

At this point, I could use the pool command to apply the model to the imputed datasets. This works great. However, I know that future data will have missing values ​​and would like to somehow apply this MI gradually?

+3


source to share


1 answer


It doesn't have multiple imputation, but the yaImpute package has a predictor () function to pass values ​​for new data. I ran a test using training data (including NA) to create a "yai" object and then use that object via pred () to cast the values ​​into a new test dataset. Unlike CareT preProcess (), yaImpute supports variable factors (at least for imputing values ​​for them) into its knn algorithm. I have not yet tested if factors can be part of the "predictors" for missing targets. yaImpute supports other imputation methods besides knn.



0


source







All Articles