How to set test size in kfold stratified sample in python?
Using sklearn, I want to have 3 splits (i.e. n_splits = 3) in the sample dataset and have a Train / Test ratio of 70:30. I can split the set 3 times, but could not figure out the size of the test (similar to the train_test_split method). Is there a way to determine the size of a test piece in StratifiedKFold?
from sklearn.model_selection import StratifiedKFold as SKF
skf = SKF(n_splits=3)
skf.get_n_splits(X, y)
for train_index, test_index in skf.split(X, y):
# Loops over 3 iterations to have Train test stratified split
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
source to share
StratifiedKFold
is by definition a K-fold split. This means that the returned iterator will give commands ( K-1
) for training, and 1
for testing. K
is managed n_splits
and thus it creates groups n_samples/K
and uses all K-1
training / testing combinations . For more information on it refer to wikipedia or google K-fold cross-validation .
In short, the size of the test case will be 1/K
(i.e. 1/n_splits
), so you can tweak this setting to control the size of the test (for example, n_splits=3
will have a test partition of the size of 1/3 = 33%
your data). However, it StratifiedKFold
will iterate over groups K
from K-1
and may not be what you want.
Having said that, you might be interested in StratifiedShuffleSplit , which only returns a configurable section count and train / challenge ratio. If you only want one split, you can tweak n_splits=1
and save test_size=0.3
(or any other ratio).
source to share