Scikit-learn: cross_val_predict only works for sections

Question

Scikit-learn: cross_val_predict only works for sections

I am struggling to figure out how to implement TimeSeriesSplit in sklearn.

The suggested answer from the link below gives the same ValueError.

sklearn TimeSeriesSplit cross_val_predict only works for partitions

here is the relevant bit from my code:

from sklearn.model_selection import cross_val_predict
from sklearn import svm

features = df[df.columns[0:6]]
target = df['target']

clf = svm.SVC(random_state=0)

pred = cross_val_predict(clf, features, target, cv=TimeSeriesSplit(n_splits=5).split(features))

Traceback ValueError (last call last) in () ----> 1 pred = cross_val_predict (clf, features, target, cv = TimeSeriesSplit (n_splits = 5) .split (features))

/home/jedwards/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/model_selection/_validation.py to cross_val_predict (score, X, y, groups, cv, n_jobs, verbose, fit_params, pre_dispatch, method ) 407 408 if not _check_is_permutation (test_indices, _num_samples (X)): -> 409 raise ValueError ('cross_val_predict only works for sections') 410 411 inv_test_indices = np.empty (len (test_indices), dtype = int)

ValueError: cross_val_predict only works for sections

+3

python scikit-learn cross-validation

James edwards Apr 07 17 at 10:14

source to share

1 answer

Matthijs Brouns · Accepted Answer · 2017-04-07T13:38:47+0000

cross_val_predict cannot work with TimeSeriesSplit as the first TimeSeriesSplit is never part of the test dataset, meaning there are no predictions for it.

eg. when your dataset is [1, 2, 3, 4, 5]

fold 1 - train: [1], test: [2]
fold 2 - train: [1, 2], test: [3]
fold 3 - train: [1, 2, 3], test: [4]
fold 4 - train: [1, 2, 3, 4], test: [5]

none of the folds are equal to 1 in the test set

If you want to have predictions for 2-5, you can manually loop through the splits generated by your CV and save the predictions for 2-5 yourself.

Scikit-learn: cross_val_predict only works for sections

More articles: