6 Dimensionality Reduction Algorithms With Python
Dimensionality reduction is an unsupervised learning technique.
Nevertheless, it can be used as a data transform pre-processing step for machine learning algorithms on classification and regression predictive modeling datasets with supervised learning algorithms.
There are many dimensionality reduction algorithms to choose from and no single best algorithm for all cases. Instead, it is a good idea to explore a range of dimensionality reduction algorithms and different configurations for each algorithm.
In this tutorial, you will discover how to fit and evaluate top dimensionality reduction algorithms in Python.
After completing this tutorial, you will know:
Dimensionality reduction seeks a lower-dimensional representation of numerical input data that preserves the salient relationships in the data.
There are many different dimensionality reduction algorithms and no single best method for all datasets.
How to implement, fit, and evaluate top dimensionality reduction in Python with the scikit-learn machine learning library.
Discover data cleaning, feature selection, data transforms, dimensionality reduction and much more in my new book , with 30 step-by-step tutorials and full Python source code.
Let’s get started.
Dimensionality Reduction Algorithms With Python Photo by Bernard Spragg. NZ , some rights reserved.
This tutorial is divided into three parts; they are:
Dimensionality Reduction Algorithms
Examples of Dimensionality Reduction
Scikit-Learn Library Installation
Principal Component Analysis
Singular Value Decomposition
Linear Discriminant Analysis
Locally Linear Embedding
Modified Locally Linear Embedding
Dimensionality reduction refers to techniques for reducing the number of input variables in training data.
When dealing with high dimensional data, it is often useful to reduce the dimensionality by projecting the data to a lower dimensional subspace which captures the “essence” of the data. This is called dimensionality reduction.
— Page 11, Machine Learning: A Probabilistic Perspective , 2012.
High-dimensionality might mean hundreds, thousands, or even millions of input variables.
Fewer input dimensions often means correspondingly fewer parameters or a simpler structure in the machine learning model, referred to as degrees of freedom . A model with too many degrees of freedom is likely to overfit the training dataset and may not perform well on new data.
It is desirable to have simple models that generalize well, and in turn, input data with few input variables. This is particularly true for linear models where the number of inputs and the degrees of freedom of...