How does threshold for LinearSVC conversion work?
I am using LinearSVC as a preprocessing step for my decision tree classifier. I run LinearSVC and then I transform (X). I notice that the number of features is reduced from about 35 to 9. I would like to know which features were selected.
I know that by default transform (X) works with threshold = 'mean'. Can anyone tell me an example of how it determines whether to keep this function or not?
This is my coef _.
array([[ -2.45022173e-01, -8.61032928e-02, -2.39513401e-03,
-2.07443644e-02, 2.49547244e-03, -3.14133367e-02,
7.09627000e-03, 3.94563929e-03, 6.78145800e-02,
1.59497586e-01, -1.24063075e-01, -4.79223418e-02,
-3.70412138e-02, 4.39187481e-02, 1.30004636e-02,
-2.31911643e-03, -1.63937709e-03, -2.18402321e-03,
-2.65601394e-03, 1.48259224e-02, -6.15157373e-02,
-3.65242492e-04, 8.10479000e-02, -1.58338535e-01,
5.06225924e-03, 1.16183358e-03, 6.44170055e-02,
-2.56651350e-03, 1.62029008e-01, -1.69785296e+00,
-1.91045465e+00, -1.64206237e+00, -1.80735175e+00,
-1.39504546e+00, -1.66709852e+00],
[ 4.14083584e-01, 2.03703885e-01, 4.82783739e-03,
7.90756359e-02, -1.45063508e-03, 1.05486236e-01,
-3.01145160e-01, -7.81145855e-03, -3.39445309e-01,
-5.66603101e-01, 2.41489561e-01, 3.11615301e-01,
-3.59607168e-01, -4.04092005e-01, -3.18262477e-03,
8.14224001e-04, 8.64216590e-04, 6.59107091e-03,
5.48336293e-03, -1.76329713e-02, 2.33854833e-01,
-1.00455178e-01, -5.00175471e-02, 4.81448974e-02,
3.13891484e-01, 3.54014313e-03, 3.32840843e-01,
6.85018177e-05, -6.75410702e-01, -1.03258781e-01,
2.59870671e-01, -3.03956500e-01, -1.58732859e-01,
-3.89772985e-01, -2.55624888e-01],
[ 1.06132321e-01, 1.23617156e-01, 1.40819416e-03,
1.06118853e-01, 5.11221833e-04, -1.68780545e-01,
9.27425326e-02, 3.52220207e-03, 2.12134293e-01,
3.54667378e-01, 1.22840976e-01, -4.21232679e-01,
3.55037449e-01, -2.06715803e-01, 6.18856581e-02,
-4.63662372e-03, -5.04710160e-04, -4.65594740e-04,
1.01529235e-02, 1.15598254e-03, 4.49951214e-02,
2.20830485e-01, -1.01269555e-01, 3.03514605e-01,
-1.27056578e-01, -2.17123757e-02, -2.51044202e-01,
7.19562937e-03, -6.74304600e-01, 2.47410746e-01,
-7.76792375e-02, 2.26260621e-01, 3.83972532e-01,
4.35143804e-01, 3.50074110e-02],
[ 6.33038442e-02, 3.71367520e-01, -1.21238483e-02,
-5.92230089e-02, -2.69617795e-03, 2.44885573e-01,
-1.12043386e-01, -1.05526224e-01, -9.88583026e-02,
-6.09121814e-01, -5.16313417e-01, 2.83500385e-01,
2.04390765e-01, 9.13454922e-01, 2.12522482e-02,
4.67960378e-03, 3.78514732e-03, -1.89184862e-03,
-2.35710741e-02, 2.77863999e-02, 5.93172013e-01,
-3.98200956e-01, 2.04199614e-01, -6.20399607e-02,
1.19732985e-01, 1.16674647e-01, -1.27517918e-03,
-4.23253804e-03, -1.82480535e+00, 9.29959444e-01,
1.21162165e+00, 1.09899835e+00, 7.42987354e-01,
9.61956169e-01, 8.72089435e-01],
[ 2.98336593e-01, 1.36166556e-01, 8.55303000e-04,
1.13137553e-01, -4.11417197e-03, 2.59650136e-01,
7.87008264e-02, 7.22415689e-03, -3.64334467e-02,
-2.57473176e-02, -1.01132206e-01, -4.52864069e-02,
8.62911851e-03, -1.01396648e-01, -1.71810251e-01,
2.87556170e-02, -5.75335168e-03, -1.31809609e-03,
2.27847222e-02, -1.64198532e-02, -8.11859436e-03,
-2.60700154e-02, 1.74207263e-01, 1.10324971e-01,
6.65055594e-02, 4.11639440e-03, -9.68050856e-02,
4.32464307e-02, 1.26432150e+00, 2.80210335e-02,
1.30525549e-01, 4.34196521e-01, -2.46460632e-01,
3.85467301e-01, -2.58179093e-02]])
I have read the documentation. I'm not sure how this "average" is calculated. Does this mean remedy? If I have 5 classes and 35 functions, the coefficient for that function for each class will be different. Should I find the average of the features, then find the average of them?
source to share
From the documentation :
Function selection threshold. Functions with a value greater than or equal to are retained, and the rest are discarded. If "median" (respectively "mean"), then the threshold value is the median (or mean) sign of importance. A scaling factor (for example, "1.25 * average") can also be used. If "No" and, if available, the object attribute threshold is used. Otherwise, "mean" is used by default.
The value here is determined by the coefficients.
source to share