Feature Engineering and Selection (Book Review)


Data preparation is the process of transforming raw data into learning algorithms.
In some cases, data preparation is a required step in order to provide the data to an algorithm in its required input format. In other cases, the most appropriate representation of the input data is not known and must be explored in a trial-and-error manner in order to discover what works best for a given model and dataset.
Max Kuhn and Kjell Johnson have written a new book focused on this important topic of data preparation and how to get the most out of your data on a predictive modeling project with machine learning algorithms. The title of the book is “ Feature Engineering and Selection: A Practical Approach for Predictive Models ” and it was released in 2019.
In this post, you will discover my review and breakdown of the book “ Feature Engineering and Selection ” on the topic of data preparation for machine learning.
Let’s dive in!

Feature Engineering and Selection (Book Review)

Overview
This tutorial is divided into three parts; they are:

Feature Engineering and Selection
Breakdown of the Book
Final Thoughts on the Book

Feature Engineering and Selection
“ Feature Engineering and Selection: A Practical Approach for Predictive Models ” is a book written by Max Kuhn and Kjell Johnson and published in 2019.
Kuhn and Johnson are the authors of one of my favorite books on practical machine learning titled “ Applied Predictive Modeling ,” published in 2013. And Kuhn is also the author of the popular caret R package for machine learning. As such, any book they publish, I will immediately buy and devour.
This new book is focused on the problem of data preparation for machine learning.
The authors highlight that although fitting and evaluating models is routine, achieving good performance for a predictive modeling problem is highly dependent upon how the data is prepared.
Despite our attempts to follow these good practices, we are sometimes frustrated to find that the best models have less-than-anticipated, less-than-useful useful predictive performance. This lack of performance may be due to […] relevant predictors that were collected are represented in a way that models have trouble achieving good performance.
— Page xi, “ Feature Engineering and Selection ,” 2019.
They refer to the process of preparing data for modeling as “ feature engineering .”
This is a slightly different definition than I am used to. I would call it “ data preparation ” or “ data preprocessing ” and hold “ feature engineering ” apart as a subtask focused on systematic steps for creating new input variables from existing data.
Nevertheless, I see where they are coming from, as all data preparation could fit that definition.
Adjusting and reworking the predictors to enable models to better uncover...

Top