R Is there a way to make the threshold in linear regression?

Question

R Is there a way to make the threshold in linear regression?

I'm trying to do linear regression, but I just want to use variables with positive coefficients (I think this is called a hard threshold, but I'm not sure).

eg:

> summary(lm1)

Call:
lm(formula = value ~ ., data = intCollect1[, -c(1, 3)])

Residuals:
     Min       1Q   Median       3Q      Max 
-15.6518  -0.2089  -0.0227   0.2035  15.2235 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.099763   0.024360   4.095 4.22e-05 ***
modelNum3802    0.208867   0.008260  25.285  < 2e-16 ***
modelNum8000   -0.086258   0.013104  -6.582 4.65e-11 ***
modelNum8001   -0.058225   0.010741  -5.421 5.95e-08 ***
modelNum8002   -0.001813   0.012087  -0.150 0.880776    
modelNum8003   -0.083646   0.011015  -7.594 3.13e-14 ***
modelNum8004    0.002521   0.010729   0.235 0.814254    
modelNum8005    0.301286   0.011314  26.630  < 2e-16 ***

In the above regression, I would like to use models 3802, 8004, and 8005. Is there a way to do this without copying and pasting each variable name?

+3

r linear-regression threshold

screechOwl 28 Mar 12 at 21:15

source to share

3 answers

flodel · Answer 1 · 2012-03-29T01:22:42+0000

Instead of using, lm

you can formulate your problem in terms of quadratic programming:

Minimize the sum of the squares of replication errors by keeping your linear coefficients positive.

Such problems can be solved with the help lsei

of the package limSolve

. Looking at your example, it looks something like this:

x.variables <- c("modelNum3802", "modelNum8000", ...)
num.var <- length(x.variables)

lsei(A = intCollect1[, x.variables],
     B = intCollect1$value,
     G = diag(num.var),
     H = rep(0, num.var))

screechOwl · Answer 2 · 2013-05-03T12:53:31+0000

I found a nnls

(non-negative least square) package to look out for.

John Jiang · Answer 3 · 2014-01-10T18:23:41+0000

You can also reformulate your linear regression model as follows: label ~ sum (exp (\ alpha_i) f_i)

optimization target would be sum_j (label_j - sum_i (exp (\ alpha_i) f_i)) ^ 2

This has no closed form solution, but can be effectively resolved since it is convex in \ alpha_i.

Once you compute \ alpha_i's, you can rewrite them as regular linear model regressors by expressing them.

R Is there a way to make the threshold in linear regression?

More articles: