Rethinking Our Assumptions About Language Model Evaluation

Nancy Fulda

Rethinking Our Assumptions About Language Model Evaluation

2020

Nancy Fulda

Many applications of pre-trained language models use their learned internal representations, also known as word- or sentence embeddings, as input features for other language-based tasks. Over recent years, this has led to the implicit assumption that the quality of such embeddings is determined solely by their ability to facilitate transfer learning. In this position paper we argue that pre-trained linguistic embeddings have value above and beyond their utility as input features for downstream tasks. We adopt a paradigm in which they are instead treated as implicit knowledge repositories that can be used to solve common-sense reasoning problems via linear operations on embedded text. To validate this paradigm, we apply our methodology to tasks such as threat detection, emotional classification, and sentiment analysis, and demonstrate that linguistic embeddings show strong potential at solving such tasks directly, without the need for additional training. Motivated by these results, we advocate for empirical evaluations of language models that include vector-based reasoning tasks in addition to more traditional benchmarks, with the ultimate goal of facilitating language-based reasoning, or ‘reasoning in the linguistic domain’. We conclude by analyzing the structure of currently available embedding models and identifying several shortcomings which must be overcome in order to realize the full potential of this approach.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations