Aksara: An Indonesian Morphological Analyzer that Conforms to the UD v2 Annotation Guidelines

2020 
The objective of this work is to build an Indonesian morphological analyzer named Aksara that conforms to the Universal Dependencies (UD), especially UD v2. Many works had developed Indonesian morphological analyzer, but as far as we know none conforms to the UD annotation guidelines. In building Aksara we use the same approach with MorphInd, another Indonesian morphological analyzer, that uses finite state compiler named Foma. Aksara has capability to perform four tasks: 1) word segmentation, 2) lemmatization, 3) POS tagging, and 4) morphological features analysis. To evaluate the quality of this tool, we used an Indonesian dependency treebank that conforms to UD v2 as the gold standard. We also compare the performance measures of Aksara with MorphInd, by mapping MorphInd output to CoNNL-U format. The experiment results show that for all the four tasks Aksara outperforms MorphInd. For word segmentation task, Aksara has accuracy of 96.9%, for lemmatization with case-sensitive it has accuracy of 94.83%, for POS tagging it has F1-score of 88.2% and finally for morphological features analysis, among 18 feature-value tags already implemented, nine tags already have F1-score more than 80%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    1
    Citations
    NaN
    KQI
    []