Nested Cross-Validation for Machine Learning with Python
The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not used during training.
This procedure can be used both when optimizing the hyperparameters of a model on a dataset, and when comparing and selecting a model for the dataset. When the same cross-validation procedure and dataset are used to both tune and select a model, it is likely to lead to an optimistically biased evaluation of the model performance.
One approach to overcoming this bias is to nest the hyperparameter optimization procedure under the model selection procedure. This is called double cross-validation or nested cross-validation and is the preferred way to evaluate and compare tuned machine learning models.
In this tutorial, you will discover nested cross-validation for evaluating tuned machine learning models.
After completing this tutorial, you will know:
Hyperparameter optimization can overfit a dataset and provide an optimistic evaluation of a model that should not be used for model selection.
Nested cross-validation provides a way to reduce the bias in combined hyperparameter tuning and model selection.
How to implement nested cross-validation for evaluating tuned machine learning algorithms in scikit-learn.
Let’s get started.
Nested Cross-Validation for Machine Learning with Python Photo by Andrew Bone , some rights reserved.
Tutorial Overview
This tutorial is divided into three parts; they are:
Combined Hyperparameter Tuning and Model Selection
What Is Nested Cross-Validation
Nested Cross-Validation With Scikit-Learn
Combined Hyperparameter Tuning and Model Selection
It is common to evaluate machine learning models on a dataset using k-fold cross-validation.
The k-fold cross-validation procedure divides a limited dataset into k non-overlapping folds. Each of the k folds is given an opportunity to be used as a held back test set whilst all other folds collectively are used as a training dataset. A total of k models are fit and evaluated on the k holdout test sets and the mean performance is reported.
For more on the k-fold cross-validation procedure, see the tutorial:
A Gentle Introduction to k-fold Cross-Validation
The procedure provides an estimate of the model performance on the dataset when making a prediction on data not used during training. It is less biased than some other techniques, such as a single train-test split for small- to modestly-sized dataset. Common values for k are k=3, k=5, and k=10.
Each machine learning algorithm includes one or more hyperparameters that allow the algorithm behavior to be tailored to a specific dataset. The trouble is, there is rarely if ever good heuristics on how to configure the model hyperparameters for a dataset. Instead, an optimization procedure...