How to set test size in kfold stratified sample in python?

Using sklearn, I want to have 3 splits (i.e. n_splits = 3) in the sample dataset and have a Train / Test ratio of 70:30. I can split the set 3 times, but could not figure out the size of the test (similar to the train_test_split method). Is there a way to determine the size of a test piece in StratifiedKFold?

from sklearn.model_selection import StratifiedKFold as SKF
skf = SKF(n_splits=3)
skf.get_n_splits(X, y)
for train_index, test_index in skf.split(X, y):
# Loops over 3 iterations to have Train test stratified split
     X_train, X_test = X[train_index], X[test_index]
     y_train, y_test = y[train_index], y[test_index]

      

+3


source to share


1 answer


StratifiedKFold

is by definition a K-fold split. This means that the returned iterator will give commands ( K-1

) for training, and 1

for testing. K

is managed n_splits

and thus it creates groups n_samples/K

and uses all K-1

training / testing combinations . For more information on it refer to wikipedia or google K-fold cross-validation .

In short, the size of the test case will be 1/K

(i.e. 1/n_splits

), so you can tweak this setting to control the size of the test (for example, n_splits=3

will have a test partition of the size of 1/3 = 33%

your data). However, it StratifiedKFold

will iterate over groups K

from K-1

and may not be what you want.



Having said that, you might be interested in StratifiedShuffleSplit , which only returns a configurable section count and train / challenge ratio. If you only want one split, you can tweak n_splits=1

and save test_size=0.3

(or any other ratio).

+2


source







All Articles