Box-Cox transformation with survey data in R

Does anyone know a way to evaluate Box-Cox multivariate transforms with survey data in R? I don't know anything that takes into account strata and clusters (the data I'm working with), but even one that takes into account the probability weight would be great. My main concern is that the distribution of one or more variables can change when probability weights are applied, so the transformation can change radically. There can also be error implications and the Box-Cox algorithm, etc ... but that goes beyond the theory-based approach.

Updated question

The R function powerTransform

works great, but I don't think there is anything else in there for the survey data. I thought Stata would handle this, but as Nick pointed out, it isn't. The only Box-Cox transformation that handles sample weights seems to be this .

Do you know of any R function that allows you to apply both one-dimensional and multivariate BoxCox transformations to probability-weighted data?

I don't have any data, but I'm just wondering if anyone has found a solution for this. I know people appreciate it when a specific example is given like this ...

Univariate Box-Cox: Results are returned for the univariate Box-Cox using the lm and svyglm (survey) objects.

library(survey)
data(api)
library(car)
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
Sur<-svyglm(api00~mobility, design=dstrat)
NotSur<-lm(api00~mobility, data=apistrat)
powerTransform(Sur)
powerTransform(NotSur)

      

However, I don't think the powerTransformation with the shooting object is correct because you get the same results as NotSur (and different from Sur) when you run

None<-svydesign(id=~1, weights=rep(1,nrow(apistrat)), data=apistrat, )
Sur2<-svyglm(api00~mobility, design=None)
powerTransform(Sur2)

      

I'm even less sure about how you will find multidimensional normality, since you will have to use actual data for example.

summary(powerTransform(cbind(api00,mobility)~1,apistrat))

      

+3


source to share


2 answers


the link you specified refers to a custom function in SAS that is executed in a data step. It should be possible to reprogram a method in R.

If you look at the suggested SAS method here , you can see what it uses proc transreg

to estimate the required power conversion. This SAS process does not accept survey weights. I'm not sure what the option weight

in this proc is doing here



Update: I took a close look at the first link you gave here . It looks like weighing is done proc univariate

with this option enabled weight

if the data contains weights. However, if you look at the detail weight

from here , you will see that weights are used to control variances. I'm not sure if you want to use this assumption for your data.

+1


source


Using the weights as in your linked SAS macro should give a good accurate estimate of the optimal transform, but will likely give an unreasonable estimate of the interval - since the log likelihood ratio will not have a standard chi-square distribution.



Scaling the weights to sum with the sample size will probably give the correct spacing in the chalet, but a correct analog based on the Box and Cox method construct would require a sample distribution of the "working" likelihood ratio (as used by methods AIC

and anova

for polling :: svyglm)

0


source







All Articles