Model prediction is predicted one day ahead - sliding window

Question

Model prediction is predicted one day ahead - sliding window

I am struggling with a problem. I am using SparkR for time series forecasting, but this scenario can also be carried over to a regular R environment. Instead of using an ARIMA model, I want to use regression models like Random Forest Regression, etc. to predict the load one day ahead. I also read about sliding window approximation for evaluating the performance of various regressors in relation to combinations of different parameters. So to understand better, this is an example of my dataset structure:

Timestamp              UsageCPU     UsageMemory   Indicator  Delay
2014-01-03 21:50:00    3123            1231          1        123
2014-01-03 22:00:00    5123            2355          1        322
2014-01-03 22:10:00    3121            1233          2        321
2014-01-03 22:20:00    2111            1234          2        211
2014-01-03 22:30:00    1000            2222          2         0 
2014-01-03 22:40:00    4754            1599          1         0

To use any kind of regressor, the next step is to extract the function and convert it to a readable format, since these regressions cannot read timestamps:

Year   Month  Day  Hour    Minute    UsageCPU   UsageMemory  Indicator Delay
2014   1      3    21       50        3123        1231          1      123
2014   1      3    22       00        5123        2355          1      322
2014   1      3    22       10        3121        1233          2      321
2114   1      3    22       20        2111        1234          2      211

The next step is to create a training and test suite for the model.

trainTest <-randomSplit(SparkDF,c(0.7,0.3), seed=42)
train <- trainTest[[1]]
test <- trainTest[[2]]

Then you can create a model prediction + (setting randomForest is not relevant in the first place):

model <- spark.randomForest(train, UsageCPU ~ ., type = "regression", maxDepth = 5, maxBins = 16)
predictions <- predict(model, test)

So I know all these steps and by plotting the predicted data with the actual data, it looks pretty good. But this regression model is not dynamic, which means I cannot predict one day ahead. Since there are no functions like UsageCPU, UsageMemory, etc., I want to predict from historical values for the next day. As mentioned in the beginning, a sliding window approach might work here, but I'm not sure how to apply it (in a whole dataset, only on a training or test set).

This implementation was from shabbychef's and mbq :

 slideMean<-function(x,windowsize=3,slide=2){
 idx1<-seq(1,length(x),by=slide);
 idx1+windowsize->idx2;
 idx2[idx2>(length(x)+1)]<-length(x)+1;
 c(0,cumsum(x))->cx;
 return((cx[idx2]-cx[idx1])/windowsize);
}

The last question concerns window size. I want to predict the next day in hours (00.01.02.03 ...), but the timestamps have an interval of 10 minutes, so in my calculation the window size should be 144 (10 * 60 * 24/10).

It would be so nice if someone can help me. Thank!

+3

r statistics machine-learning sliding-window prediction

Daniel May 02 '17 at 11:49

source to share

1 answer

user29120 · Accepted Answer · 2017-05-05T16:50:02+0000

I also had the same problem for time series forecasting using neural networks. I have implemented many models and the one that worked best was sliding window combined with neural networks. I have also confirmed from other Researchers in this area. from this we conclude that if you want to predict 1 day ahead (24 horizons) in one step, then the system will require training. We continued with the following:

1. We had a sliding window of 24 hours. e.g lets use [1,2,3] here
2. Then use ML model to predict the [4]. Meaning use value 4 as target. 
# As illustration we had 
x = [1,2,3] 
# then set target as 
y=[4]. 
# We had a function that returns the x=[1,2,3] and y =[4] and
# shift the window in the next training step. 
3.To the:
x =[1,2,3] 
we can add further features that are important to the model. 
x=[1,2,3,feature_x]

4. Then we minimise error and shift the window to have:
 x = [2,3,4,feature_x] and y = [5]. 
5. You could also predict two values ahead. e.g [4,5] .
6. Use a list to collect output and plot
7. Make prediction after the training.

Model prediction is predicted one day ahead - sliding window

More articles: