The Best NLP Papers From ICLR 2020

I went through 687 papers that were accepted to ICLR 2020 virtual conference (out of 2594 submitted  –  up 63% since 2019!) and identified 9 papers with the potential to advance the use of deep learning NLP models in everyday use cases.
Here are the papers found and why they matter.
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
Main Contribution:  A commonly used task for pre-training language models is to mask the input and have the model predict what is masked. This paper introduces a new pre-training task called token detection. In the new task, the authors replace some tokens with alternatives by sampling from a generator. They then trained a discriminator to predict whether the generator replaced each token in an input or not.
Why It Matters:  This task is more data efficient, learning potentially from all tokens in a dataset versus the ~15% masked in the usual approach. It shows there’s still room for additional creativity in how to train a language model.

An overview of replaced token detection
Do you find this in-depth content on NLP research to be useful? Subscribe below to be updated when we release new relevant content .
The Curious Case of Neural Text Degeneration
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, Yejin Choi
Main Contribution:  The authors propose a new decoding strategy called nucleus sampling — which truncates the tail of the probability distribution, sampling from the dynamic nucleus of tokens containing the vast majority of the probability mass. The counter-intuitive empirical observation is that even though the use of likelihood as a training objective leads to high-quality models for a broad range of language understanding tasks, using likelihood as a decoding objective leads to text that is bland and strangely repetitive.
Why It Matters:  Text degeneration is an issue even in the latest cutting edge language models. Decoding strategies are important to create more human-like text generation for various tasks. Moving away from greedy algorithms like beam search will help performance on downstream tasks.

Example of beam search based generation vs human generation
What Can Neural Networks Reason About?
Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka
Main contribution:  This paper introduces a framework called algorithmic alignment to measure how well neural networks perform on reasoning tasks. Neural networks that “align” with known algorithmic solutions are better able to learn the solutions. The framework roughly states that for the model to be able to learn and successfully generalize on a reasoning task, it needs to be able to easily learn (to approximate) steps of the...
