Tagsets and Datasets: Some Experiments Based on Portuguese Language

2018 
We report the results of two experiments aimed at investigating the impact of linguistic variation on PoS tagging. In both cases, we depart from the conversion of the corpus MacMorpho [1], which was re-annotated according to the Universal Dependencies PoS tagset. Throughout the conversion process, we faced some linguistic challenges related to the past participle forms. As a result, we created two corpora (MacMoprho-UD and MacMorpho-UD+PCP). We used these three corpora (MacMorpho; MacMoprho-UD and MacMorpho-UD+PCP) to assess the impact on PoS learning in different scenarios.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    2
    Citations
    NaN
    KQI
    []