How to specify formula in rfsrc in R

I have a dataframe (train3) with 10 numeric variables and a coefficient.

I would like to make a random forest classifier using a rfsrc

function from a package randomForestSRC

.

The data looks like this:

summary(train3)



roll_belt        pitch_belt          yaw_belt       total_accel_belt  gyros_belt_x        gyros_belt_y     
 Min.   :-28.90   Min.   :-55.8000   Min.   :-180.00   Min.   : 0.00    Min.   :-1.040000   Min.   :-0.64000  
 1st Qu.:  1.10   1st Qu.:  1.7600   1st Qu.: -88.30   1st Qu.: 3.00    1st Qu.:-0.030000   1st Qu.: 0.00000  
 Median :113.00   Median :  5.2800   Median : -13.00   Median :17.00    Median : 0.030000   Median : 0.02000  
 Mean   : 64.41   Mean   :  0.3053   Mean   : -11.21   Mean   :11.31    Mean   :-0.005592   Mean   : 0.03959  
 3rd Qu.:123.00   3rd Qu.: 14.9000   3rd Qu.:  12.90   3rd Qu.:18.00    3rd Qu.: 0.110000   3rd Qu.: 0.11000  
 Max.   :162.00   Max.   : 60.3000   Max.   : 179.00   Max.   :29.00    Max.   : 2.220000   Max.   : 0.64000  
  gyros_belt_z      accel_belt_x       accel_belt_y     accel_belt_z     classe  
 Min.   :-1.4600   Min.   :-120.000   Min.   :-69.00   Min.   :-275.00   A:5580  
 1st Qu.:-0.2000   1st Qu.: -21.000   1st Qu.:  3.00   1st Qu.:-162.00   B:3797  
 Median :-0.1000   Median : -15.000   Median : 35.00   Median :-152.00   C:3422  
 Mean   :-0.1305   Mean   :  -5.595   Mean   : 30.15   Mean   : -72.59   D:3216  
 3rd Qu.:-0.0200   3rd Qu.:  -5.000   3rd Qu.: 61.00   3rd Qu.:  27.00   E:3607  
 Max.   : 1.6200   Max.   :  85.000   Max.   :164.00   Max.   : 105.00      

      

My rfsrc call looks like this:

fit = rfsrc (classe ~ ., data = train3)

Error in parseFormula(formula, data) : 
  the y-outcome must be either real or a factor.

      

Cluster

seems to be a factor:

str(train3)

Classes โ€˜tbl_dfโ€™ and 'data.frame':  19622 obs. of  11 variables:
 $ roll_belt       : num  1.41 1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 ...
 $ pitch_belt      : num  8.07 8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 ...
 $ yaw_belt        : num  -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
 $ total_accel_belt: int  3 3 3 3 3 3 3 3 3 3 ...
 $ gyros_belt_x    : num  0 0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 ...
 $ gyros_belt_y    : num  0 0 0 0 0.02 0 0 0 0 0 ...
 $ gyros_belt_z    : num  -0.02 -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 ...
 $ accel_belt_x    : int  -21 -22 -20 -22 -21 -21 -22 -22 -20 -21 ...
 $ accel_belt_y    : int  4 4 5 3 2 4 3 4 2 4 ...
 $ accel_belt_z    : int  22 22 23 21 24 21 21 21 24 22 ...
 $ classe          : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...

      

What am I missing? The y-result would seem to be complex, which is the factor.

+3


source to share


1 answer


I had the same problem and it helped me a lot ... I needed the "as.data.frame" inside the rfsrc formula to work ... Thanks everyone!



-1


source







All Articles