Singular Value Decomposition for Dimensionality Reduction in Python


Reducing the number of input variables for a predictive model is referred to as dimensionality reduction.
Fewer input variables can result in a simpler predictive model that may have better performance when making predictions on new data.
Perhaps the more popular technique for dimensionality reduction in machine learning is Singular Value Decomposition , or SVD for short. This is a technique that comes from the field of linear algebra and can be used as a data preparation technique to create a projection of a sparse dataset prior to fitting a model.
In this tutorial, you will discover how to use SVD for dimensionality reduction when developing predictive models.
After completing this tutorial, you will know:

Dimensionality reduction involves reducing the number of input variables or columns in modeling data.
SVD is a technique from linear algebra that can be used to automatically perform dimensionality reduction.
How to evaluate predictive models that use an SVD projection as input and make predictions with new raw data.

Let’s get started.

Singular Value Decomposition for Dimensionality Reduction in Python Photo by Kimberly Vardeman , some rights reserved.

Tutorial Overview
This tutorial is divided into three parts; they are:

Dimensionality Reduction and SVD
SVD Scikit-Learn API
Worked Example of SVD for Dimensionality

Dimensionality Reduction and SVD
Dimensionality reduction refers to reducing the number of input variables for a dataset.
If your data is represented using rows and columns, such as in a spreadsheet, then the input variables are the columns that are fed as input to a model to predict the target variable. Input variables are also called features.
We can consider the columns of data representing dimensions on an n-dimensional feature space and the rows of data as points in that space. This is a useful geometric interpretation of a dataset.
In a dataset with k numeric attributes, you can visualize the data as a cloud of points in k-dimensional space …
— Page 305, Data Mining: Practical Machine Learning Tools and Techniques , 4th edition, 2016.
Having a large number of dimensions in the feature space can mean that the volume of that space is very large, and in turn, the points that we have in that space (rows of data) often represent a small and non-representative sample.
This can dramatically impact the performance of machine learning algorithms fit on data with many input features, generally referred to as the “ curse of dimensionality .”
Therefore, it is often desirable to reduce the number of input features. This reduces the number of dimensions of the feature space, hence the name “dimensionality reduction.”
A popular approach to dimensionality reduction is to use techniques from the field of linear algebra....

Top