Towards combining rule-based and statistical part of speech tagging in agglutinative languages

2007 
We present a composite part of speech tagger for Turkish which combines the rule- based and statistical approaches. The tagger makes use of word frequencies and n-gram statistics from a corpus. We use the output of a morphological analyzer in order to get more accurate results and also to eliminate the sparse data problem. In addition, we employ a heuristics about the position of words in the sentences. Although the experiments have been performed on a very small corpus, the results have shown that the use of a composite approach and heuristics improves the accuracy of the tagger.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    5
    Citations
    NaN
    KQI
    []