R Is there a way to make the threshold in linear regression?
I'm trying to do linear regression, but I just want to use variables with positive coefficients (I think this is called a hard threshold, but I'm not sure).
eg:
> summary(lm1)
Call:
lm(formula = value ~ ., data = intCollect1[, -c(1, 3)])
Residuals:
Min 1Q Median 3Q Max
-15.6518 -0.2089 -0.0227 0.2035 15.2235
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.099763 0.024360 4.095 4.22e-05 ***
modelNum3802 0.208867 0.008260 25.285 < 2e-16 ***
modelNum8000 -0.086258 0.013104 -6.582 4.65e-11 ***
modelNum8001 -0.058225 0.010741 -5.421 5.95e-08 ***
modelNum8002 -0.001813 0.012087 -0.150 0.880776
modelNum8003 -0.083646 0.011015 -7.594 3.13e-14 ***
modelNum8004 0.002521 0.010729 0.235 0.814254
modelNum8005 0.301286 0.011314 26.630 < 2e-16 ***
In the above regression, I would like to use models 3802, 8004, and 8005. Is there a way to do this without copying and pasting each variable name?
source to share
Instead of using, lm
you can formulate your problem in terms of quadratic programming:
Minimize the sum of the squares of replication errors by keeping your linear coefficients positive.
Such problems can be solved with the help lsei
of the package limSolve
. Looking at your example, it looks something like this:
x.variables <- c("modelNum3802", "modelNum8000", ...)
num.var <- length(x.variables)
lsei(A = intCollect1[, x.variables],
B = intCollect1$value,
G = diag(num.var),
H = rep(0, num.var))
source to share
You can also reformulate your linear regression model as follows: label ~ sum (exp (\ alpha_i) f_i)
optimization target would be sum_j (label_j - sum_i (exp (\ alpha_i) f_i)) ^ 2
This has no closed form solution, but can be effectively resolved since it is convex in \ alpha_i.
Once you compute \ alpha_i's, you can rewrite them as regular linear model regressors by expressing them.
source to share