Test-Time Augmentation For Structured Data With Scikit-Learn
Test-time augmentation, or TTA for short, is a technique for improving the skill of predictive models.
It is typically used to improve the predictive performance of deep learning models on image datasets where predictions are averaged across multiple augmented versions of each image in the test dataset.
Although popular with image datasets and neural network models, test-time augmentation can be used with any machine learning algorithm on tabular datasets, such as those often seen in regression and classification predictive modeling problems.
In this tutorial, you will discover how to use test-time augmentation for tabular data in scikit-learn.
After completing this tutorial, you will know:
Test-time augmentation is a technique for improving model performance and is commonly used for deep learning models on image datasets.
How to implement test-time augmentation for regression and classification tabular datasets in Python with scikit-learn.
How to tune the number of synthetic examples and amount of statistical noise used in test-time augmentation.
Let’s get started.
Test-Time Augmentation With Scikit-Learn Photo by barnimages , some rights reserved.
This tutorial is divided into three parts; they are:
Standard Model Evaluation
Test-Time Augmentation Example
Test-time augmentation, or TTA for short, is a technique for improving the skill of a predictive model.
It is a procedure implemented when using a fit model to make predictions, such as on a test dataset or on new data. The procedure involves creating multiple slightly modified copies of each example in the dataset. A prediction is made for each modified example and the predictions are averaged to give a more accurate prediction for the original example.
TTA is often used with image classification, where image data augmentation is used to create multiple modified versions of each image, such as crops, zooms, rotations, and other image-specific modifications. As such, the technique results in a lift in the performance of image classification algorithms on standard datasets.
In their 2015 paper that achieved then state-of-the-art results on the ILSVRC dataset titled “ Very Deep Convolutional Networks for Large-Scale Image Recognition ,” the authors use horizontal flip test-time augmentation:
We also augment the test set by horizontal flipping of the images; the soft-max class posteriors of the original and flipped images are averaged to obtain the final scores for the image.
— Very Deep Convolutional Networks for Large-Scale Image Recognition , 2015.
For more on test-time augmentation with image data, see the tutorial:
How to Use Test-Time Augmentation to Make Better Predictions
Although often used for image...