The Sensitivity of Language Models and Humans to Winograd Schema Perturbations.

Mostafa Abdou,Vinit Ravishankar,Maria Barrett,Yonatan Belinkov,Desmond Elliott,Anders Søgaard

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations.

2020

Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones. Overall, humans are correct more often than out-of-the-box models, and the models are sometimes right for the wrong reasons. Finally, we show that fine-tuning on a large, task-specific dataset can offer a solution to these issues.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations