# R Is there a way to make the threshold in linear regression?

I'm trying to do linear regression, but I just want to use variables with positive coefficients (I think this is called a hard threshold, but I'm not sure).

eg:

```
> summary(lm1)
Call:
lm(formula = value ~ ., data = intCollect1[, -c(1, 3)])
Residuals:
Min 1Q Median 3Q Max
-15.6518 -0.2089 -0.0227 0.2035 15.2235
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.099763 0.024360 4.095 4.22e-05 ***
modelNum3802 0.208867 0.008260 25.285 < 2e-16 ***
modelNum8000 -0.086258 0.013104 -6.582 4.65e-11 ***
modelNum8001 -0.058225 0.010741 -5.421 5.95e-08 ***
modelNum8002 -0.001813 0.012087 -0.150 0.880776
modelNum8003 -0.083646 0.011015 -7.594 3.13e-14 ***
modelNum8004 0.002521 0.010729 0.235 0.814254
modelNum8005 0.301286 0.011314 26.630 < 2e-16 ***
```

In the above regression, I would like to use models 3802, 8004, and 8005. Is there a way to do this without copying and pasting each variable name?

source to share

Instead of using, `lm`

you can formulate your problem in terms of quadratic programming:

Minimize the sum of the squares of replication errors by keeping your linear coefficients positive.

Such problems can be solved with the help `lsei`

of the package `limSolve`

. Looking at your example, it looks something like this:

```
x.variables <- c("modelNum3802", "modelNum8000", ...)
num.var <- length(x.variables)
lsei(A = intCollect1[, x.variables],
B = intCollect1$value,
G = diag(num.var),
H = rep(0, num.var))
```

source to share

You can also reformulate your linear regression model as follows: label ~ sum (exp (\ alpha_i) f_i)

optimization target would be sum_j (label_j - sum_i (exp (\ alpha_i) f_i)) ^ 2

This has no closed form solution, but can be effectively resolved since it is convex in \ alpha_i.

Once you compute \ alpha_i's, you can rewrite them as regular linear model regressors by expressing them.

source to share