Let's import iris dataset and create dataframe x of independent variables & a series of target variable
import pandas as pd
from sklearn.datasets import load_iris
colnames = ['sepallength', 'sepalwidth', 'petallength', 'petalwidth']
iris = load_iris()
x = iris.data
y = iris.target
x = pd.DataFrame(x, columns=colnames)
y = pd.Series(y, name='class')
iris_data = pd.concat([x, y], axis=1)
Let's now create k sets train & test datasets
We will just take first 6 rows to explain you the concept. Have a look below how tarin & test datasets are changing for each k value.
from sklearn.model_selection import KFold
k_fold = KFold(3)
for k, (train_index, test_index) in enumerate(k_fold.split(iris_data[:6])):
print('------------- Datasets when k = ',k, ' --------------\n')
print('Training Dataset\n')
print(iris_data.iloc[train_index])
print('\n')
print('Test Dataset\n')
print(iris_data.iloc[test_index])
print('\n')
In above example, we effectively created 3 sets of train-test datasets out of one original dataset, whereas normal splitting using train_test_split would give us only 1 set of train-test datasets.