Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Zhilin Yang,Zihang Dai,Ruslan Salakhutdinov,William W. Cohen

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

2017

Zhilin Yang
Zihang Dai
Ruslan Salakhutdinov
William W. Cohen

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

Keywords:

Natural language processing
Effective method
Natural language
Perplexity
Matrix decomposition
Artificial intelligence
Treebank
Language model
Computer science
Softmax function
Bottleneck

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations