Prediction after Boxcox model in Stata
I am trying to map the prediction parameter after boxcox
in Stata 13 to my code using the steps outlined in the Stata manual (page 5).
Below is the sample code I used:
sysuse auto,clear
local indepvar weight foreign length
qui boxcox price `indepvar' ,model(lhsonly)lrtest
qui predict yhat1
qui predict resid1, residuals
//yhat2 and resid2 computed using the procedure described in Stata manual
set more off
set type double
mat coef=e(b)
local nosvar=colsof(coef)-2
qui gen constant=1
local varname weight foreign length constant
local coefname weight foreign length _cons
//step 1: compute residuals first
forvalues k = 1/`nosvar'{
local varname1 : word `k' of `varname'
local coefname1 : word `k' of `coefname'
qui gen xb`varname1'=`varname1'*_b[`coefname1']
}
qui egen xb=rowtotal(xb*)
qui gen resid=(price^(_b[theta:_cons]))-xb
//step 2: compute predicted value
qui gen yhat2=.
local noobs=_N
local theta=_b[theta:_cons]
forvalues j=1/`noobs'{
qui gen temp`j'=.
forvalues i=1/`noobs'{
qui replace temp`j'=((`theta'*(xb[`j']+resid[`i']))+1)^(1/`theta') if _n==`i'
}
qui sum temp`j'
local tempmean`j'=r(mean)
qui replace yhat2=`tempmean`j'' if _n==`j'
drop temp`j'
}
drop resid
qui gen double resid2=price-yhat2
sum yhat* resid*
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yhat1 | 74 6254.224 2705.175 3428.361 21982.45
yhat2 | 74 1.000035 8.13e-06 1.000015 1.000054
resid1 | 74 -88.96723 2094.162 -10485.45 6980.013
resid2 | 74 6164.257 2949.496 3290 15905
Note: yhat1 and remainder 1 are based on Stata predict
whereas yhat2 and remainder 2 are based on my sample code. The comparison is necessary to ensure that the marginal effect that I have calculated is correct ( margins
does not calculate the marginal effect after boxcox
).
source to share
Your definition of the first remainder is incorrect because you missed the definition of y ^ (\ lambda) on page 3 of the Manual. See also section Manual entry methods and formulas for boxcox .
Translated to your problem, in line
qui gen resid=(price^(_b[theta:_cons]))-xb
term
price^(_b[theta:_cons])
it should be:
(price^(_b[theta:_cons])-1)/_b[theta:_cons]
source to share