Algorithm helps artificial intelligence systems dodge “adversarial” inputs

In a perfect world, what you see is what you get. If this were the case, the job of artificial intelligence systems would be refreshingly straightforward.

Take collision avoidance systems in self-driving cars. If visual input to on-board cameras could be trusted entirely, an AI system could directly map that input to an appropriate action — steer right, steer left, or continue straight — to avoid hitting a pedestrian that its cameras see in the road.

But what if there’s a glitch in the cameras that slightly shifts an image by a few pixels? If the car blindly trusted so-called “adversarial inputs,” it might take unnecessary and potentially dangerous action.

A new deep-learning algorithm developed by MIT researchers is designed to help machines navigate in the real, imperfect world, by building a healthy “skepticism” of the measurements and inputs they receive.

The team combined a reinforcement-learning algorithm with a deep neural network, both used separately to train computers in playing video games like Go and chess, to build an approach they call CARRL, for Certified Adversarial Robustness for Deep Reinforcement Learning.

The researchers tested the approach in several scenarios, including a simulated collision-avoidance test and the video game Pong, and found that CARRL performed better — avoiding collisions and winning more Pong games — over standard machine-learning techniques, even in the face of uncertain, adversarial inputs.

“You often think of an adversary being someone who’s hacking your computer, but it could also just be that your sensors are not great, or your measurements aren’t perfect, which is often the case,” says Michael Everett, a postdoc in MIT’s Department of Aeronautics and Astronautics (AeroAstro). “Our approach helps to account for that imperfection and make a safe decision. In any safety-critical domain, this is an important approach to be thinking about.”

Everett is the lead author of a study outlining the new approach, which appears in IEEE’s Transactions on Neural Networks and Learning Systems . The study originated from MIT PhD student Björn Lütjens’ master’s thesis and was advised by MIT AeroAstro Professor Jonathan How.

Possible realities

To make AI systems robust against adversarial inputs, researchers have tried implementing defenses for supervised learning. Traditionally, a neural network is trained to associate specific labels or actions with given inputs. For instance, a neural network that is fed thousands of images labeled as cats, along with images labeled as houses and hot dogs, should correctly label a new image as a cat.

In robust AI systems, the same supervised-learning techniques could be tested with many slightly altered versions of the image. If the network lands on the same label — cat — for every image, there’s a good chance that, altered or not, the image is indeed of a cat, and the...