Let's import iris dataset and create dataframe x of independent variables & a series of target variable

In [1]:
import pandas as pd

from sklearn.datasets import load_iris

colnames = ['sepallength', 'sepalwidth', 'petallength', 'petalwidth']

iris = load_iris()

x = iris.data
y = iris.target

x = pd.DataFrame(x, columns=colnames)
y = pd.Series(y, name='class')

iris_data = pd.concat([x, y], axis=1)

Let's now create k sets train & test datasets

We will just take first 6 rows to explain you the concept. Have a look below how tarin & test datasets are changing for each k value.

In [2]:
from sklearn.model_selection import KFold
k_fold = KFold(3)
for k, (train_index, test_index) in enumerate(k_fold.split(iris_data[:6])):
    print('-------------  Datasets when k = ',k, ' --------------\n')
    print('Training Dataset\n')
    print(iris_data.iloc[train_index])
    print('\n')
    print('Test Dataset\n')
    print(iris_data.iloc[test_index]) 
    print('\n')
-------------  Datasets when k =  0  --------------

Training Dataset

   sepallength  sepalwidth  petallength  petalwidth  class
2          4.7         3.2          1.3         0.2      0
3          4.6         3.1          1.5         0.2      0
4          5.0         3.6          1.4         0.2      0
5          5.4         3.9          1.7         0.4      0


Test Dataset

   sepallength  sepalwidth  petallength  petalwidth  class
0          5.1         3.5          1.4         0.2      0
1          4.9         3.0          1.4         0.2      0


-------------  Datasets when k =  1  --------------

Training Dataset

   sepallength  sepalwidth  petallength  petalwidth  class
0          5.1         3.5          1.4         0.2      0
1          4.9         3.0          1.4         0.2      0
4          5.0         3.6          1.4         0.2      0
5          5.4         3.9          1.7         0.4      0


Test Dataset

   sepallength  sepalwidth  petallength  petalwidth  class
2          4.7         3.2          1.3         0.2      0
3          4.6         3.1          1.5         0.2      0


-------------  Datasets when k =  2  --------------

Training Dataset

   sepallength  sepalwidth  petallength  petalwidth  class
0          5.1         3.5          1.4         0.2      0
1          4.9         3.0          1.4         0.2      0
2          4.7         3.2          1.3         0.2      0
3          4.6         3.1          1.5         0.2      0


Test Dataset

   sepallength  sepalwidth  petallength  petalwidth  class
4          5.0         3.6          1.4         0.2      0
5          5.4         3.9          1.7         0.4      0


In above example, we effectively created 3 sets of train-test datasets out of one original dataset, whereas normal splitting using train_test_split would give us only 1 set of train-test datasets.