Scikit-learn: cross_val_predict only works for sections

I am struggling to figure out how to implement TimeSeriesSplit in sklearn.

The suggested answer from the link below gives the same ValueError.

sklearn TimeSeriesSplit cross_val_predict only works for partitions

here is the relevant bit from my code:

from sklearn.model_selection import cross_val_predict
from sklearn import svm

features = df[df.columns[0:6]]
target = df['target']

clf = svm.SVC(random_state=0)

pred = cross_val_predict(clf, features, target, cv=TimeSeriesSplit(n_splits=5).split(features))

      


Traceback ValueError (last call last) in () ----> 1 pred = cross_val_predict (clf, features, target, cv = TimeSeriesSplit (n_splits = 5) .split (features))

/home/jedwards/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/model_selection/_validation.py to cross_val_predict (score, X, y, groups, cv, n_jobs, verbose, fit_params, pre_dispatch, method ) 407 408 if not _check_is_permutation (test_indices, _num_samples (X)): -> 409 raise ValueError ('cross_val_predict only works for sections') 410 411 inv_test_indices = np.empty (len (test_indices), dtype = int)

ValueError: cross_val_predict only works for sections

+3


source to share


1 answer


cross_val_predict cannot work with TimeSeriesSplit as the first TimeSeriesSplit is never part of the test dataset, meaning there are no predictions for it.

eg. when your dataset is [1, 2, 3, 4, 5]

  • fold 1 - train: [1], test: [2]
  • fold 2 - train: [1, 2], test: [3]
  • fold 3 - train: [1, 2, 3], test: [4]
  • fold 4 - train: [1, 2, 3, 4], test: [5]


none of the folds are equal to 1 in the test set

If you want to have predictions for 2-5, you can manually loop through the splits generated by your CV and save the predictions for 2-5 yourself.

+3


source







All Articles