The Easy Questions That Stump Computers

One evening last October, the artificial-intelligence researcher Gary Marcus was amusing himself on his iPhone by making a state-of-the-art neural network look stupid. Marcus’s target, a deep-learning network called GPT-2, had recently become famous for its uncanny ability to generate plausible-sounding English prose with just a sentence or two of prompting. When journalists at The Guardian fed it text from a report on Brexit , GPT-2 wrote entire newspaper-style paragraphs, complete with convincing political and geographic references. Marcus, a prominent critic of AI hype, gave the neural network a pop quiz. He typed the following into GPT-2 : What happens when you stack kindling and logs in a fireplace and then drop some matches is that you typically start a …
Surely a system smart enough to contribute to The New Yorker would have no trouble completing the sentence with the obvious word, fire . GPT-2 responded with ick . In another attempt, it suggested that dropping matches on logs in a fireplace would start an “irc channel full of people.”
Marcus wasn’t surprised. Commonsense reasoning—the ability to make mundane inferences using basic knowledge about the world, like the fact that “matches” plus “logs” usually equals “fire”—has resisted AI researchers’ efforts for decades. Marcus posted the exchanges to his Twitter account with his own added commentary: “LMAO,” internet slang for a derisive chortle. Neural networks might be impressive linguistic mimics, but they clearly lack basic common sense.
Minutes later, Yejin Choi saw Marcus’s snarky tweet. The timing was awkward. Within the hour, Choi was scheduled to give a talk at a prominent AI conference on her latest research project: a system, nicknamed COMET, that was designed to use an earlier version of GPT-2 to perform commonsense reasoning.
[ Read: How a pioneer of machine learning became one of its sharpest critics ]
Quickly, Choi—a senior research manager at the Allen Institute for AI in Seattle, who describes herself as an “adventurer at heart”—fed COMET the same prompt Marcus had used (with its wording slightly modified to match COMET’s input format):
Gary stacks kindling and logs and drops some matches.
COMET generated 10 inferences about why Gary might be dropping the matches. Not all of the responses made sense, but the first two did: He “wanted to start a fire” or “to make a fire.” Choi tweeted the results in reply to Marcus and strode up to the podium to include them in her presentation. “It seemed only appropriate,” she said.
Common sense has been called the “ dark matter of AI ”—both essential and frustratingly elusive. That’s because common sense consists of implicit information—the broad (and broadly shared) set of unwritten assumptions and rules of thumb that humans automatically use to make sense of the world. For example, consider the following scenario:
A man went to...