OctaNLP: A Benchmark for Evaluating Multitask Generalization of Transformer-Based Pre-trained Language Models

2022 
In the last decade, deep learning based Natural Language Processing (NLP) models achieved remarkable performance on the majority of NLP tasks, especially, in machine translation, question answering and dialogue. NLP language models shifted from uncontextualized vector space models like word2vec and Glove in 2013, and 2014, to contextualized LSTM-based model like ELMO and ULMFit in 2018, to contextualized transformer-based models like BERT. Transformer-based language models are already trained to perform very well on individual NLP tasks. However, when applied to many tasks simultaneously, their performance drops considerably. In this paper, we overview NLP evaluation metrics, multitask benchmarks, and the recent transformer-based language models. We discuss the limitations of the current multitask benchmarks, and we propose our octaNLP benchmark for comparing the generalization capabilities of the transformer-based pre-trained language models on multiple downstream NLP tasks simultaneously.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    0
    Citations
    NaN
    KQI
    []