Machine Learning Mastery

Why Data Preparation Is So Important in Machine Learning
On a predictive modeling project, machine learning algorithms learn a mapping from input variables to a target variable. The most common form...

Ordinal and OneHot Encodings for Categorical Data
Machine learning models require all input and output variables to be numeric. This means that if your data contains categorical data, you must...

How to Use StandardScaler and MinMaxScaler Transforms in Python
Many machine learning algorithms perform better when numerical input variables are scaled to a standard range. This includes algorithms that...

How to Perform Feature Selection With Numerical Input Data
Feature selection is the process of identifying and selecting a subset of input features that are most relevant to the target variable. Feature...

Iterative Imputation for Missing Values in Machine Learning
Datasets may have missing values, and this can cause problems for many machine learning algorithms. As such, it is good practice to identify...

TestTime Augmentation For Structured Data With ScikitLearn
Testtime augmentation, or TTA for short, is a technique for improving the skill of predictive models. It is typically used to improve the...

How to Use Polynomial Feature Transforms for Machine Learning
Often, the input features for a predictive modeling task interact in unexpected and often nonlinear ways. These interactions can be identified...

How to Scale Data With Outliers for Machine Learning
Many machine learning algorithms perform better when numerical input variables are scaled to a standard range. This includes algorithms that...

Recursive Feature Elimination (RFE) for Feature Selection in Python
Recursive Feature Elimination , or RFE for short, is a popular feature selection algorithm. RFE is popular because it is easy to configure and...

How to Use Discretization Transforms for Machine Learning
Numerical input variables may have a highly skewed or nonstandard distribution. This could be caused by outliers in the data, multimodal...

How to Use Quantile Transforms for Machine Learning
Numerical input variables may have a highly skewed or nonstandard distribution. This could be caused by outliers in the data, multimodal...

How to Use Power Transforms for Machine Learning
Machine learning algorithms like Linear Regression and Gaussian Naive Bayes assume the numerical variables have a Gaussian probability...

How to Use Power Transforms With scikitlearn
Machine learning algorithms like Linear Regression and Gaussian Naive Bayes assume the numerical variables have a Gaussian probability...

Statistical Imputation for Missing Values in Machine Learning
Datasets may have missing values, and this can cause problems for many machine learning algorithms. As such, it is good practice to identify...

Linear Discriminant Analysis for Dimensionality Reduction in Python
Reducing the number of input variables for a predictive model is referred to as dimensionality reduction. Fewer input variables can result in a...

Singular Value Decomposition for Dimensionality Reduction in Python
Reducing the number of input variables for a predictive model is referred to as dimensionality reduction. Fewer input variables can result in a...