Adjusting Indonesian Multiword Expression Annotation to the Penn Treebank Format

2020 
Multiword Expression (MWE) has been a pain in the neck, especially in determining its word-classes in syntactic treebank. Previous work had proposed annotation guidelines for Indonesian MWEs that align to the Penn Treebank (PTB) format. However, we think that their proposed annotation still needs improvements. Therefore, this study proposes a new annotation guideline in labeling Indonesian MWE that conforms to PTB format. Moreover, we also revised the MWE annotation of an existing Indonesian constituency treebank consisting of 1030 sentences to conform to the new guidelines. To evaluate the revised treebank’s quality, we built an Indonesian constituency parser model using the revised treebank and Stanford parser. The experiments show that the resulting parser has an F1-score of 69.97%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []