Logging and Debugging in Machine Learning - How to use Python debugger and the logging module to find errors in your AI application

Logging and Debugging in Machine Learning - How to use Python debugger and the logging module to find errors in your AI application

Have you ever stuck on an error for way too long? I remember once when I spend over 2 weeks on a stupid little typo that didn’t crash the program but returned inexplicable results. I literally couldn’t sleep because of this. Because I’m 100% certain that this happened to you as well , in this 4th episode of the “Deep Learning in Production” series we are focusing on how to debug Deep Learning code and how to use logging to catch bugs and errors before deploying our model. We will use Tensorflow to showcase some examples (following the example of image segmentation we have built over the past 3 articles) but the exact same principles apply also to PyTorch and all the other AI frameworks.

As I said in the introduction of the series , Machine Learning is ordinary software and should always be treated like one. And one of the most essential parts of the software development lifecycle is debugging. Proper debugging can help eliminate future pains when our algorithms will be up and running and used by real users and can make our system as robust and reliable as our users expect it to be. And it is also integral in the early stages of coding to speed up the development of our algorithm.


How to debug Deep Learning?
Useful Tensorflow debugging and logging functions

How to debug Deep Learning?

Deep Learning debugging is more difficult than normal software because of multiple reasons:

Poor model performance doesn’t necessarily mean bugs in the code

The iteration cycle (building the model, training, and testing) is quite long

Training/testing data can also have errors and anomalies

Hyperparameters affect the final accuracy

It’s not always deterministic (e.g. probabilistic machine learning)

Static computation graph (e.g. Tensorflow and CNTK)

Based on the above, the best way to start thinking about debugging is to simplify the ML model development process as much as possible . And I mean simplify to a ridiculous level. When experimenting with our model, we should start from a very simple algorithm, with only a handful of features and gradually keep expanding by adding features and tuning hyperparameters while keeping the model simple. Once we find a satisfactory set of features, we can start increasing our model’s complexity, keep track of the metrics, and continue incrementally until the results are satisfactory for our application.

But even then bugs and anomalies might occur. Actually, they will definitely occur. When they do, our next step is to take advantage of Python debugging capabilities.

Python debugger (Pdb)

Python debugger is part of the standard python library. **The debugger...