Neural Text Categorization with Transformers for Learning Portuguese as a Second Language.

2021 
We report on the application of a neural network based approach to the problem of automatically categorizing texts according to their proficiency levels and suitability for learners of Portuguese as a second language. We resort to a particular deep learning architecture, namely Transformers, as we fine-tune GPT-2 and RoBERTa on data sets labeled with respect to the standard CEFR proficiency levels, that were provided by Camoes IC, the Portuguese official language institute. Despite the reduced size of the data sets available, we found that the resulting models overperform previous carefully crafted feature based counterparts in most evaluation scenarios, thus offering a new state-of-the-art for this task in what concerns the Portuguese language.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    0
    Citations
    NaN
    KQI
    []