Towards combining rule-based and statistical part of speech tagging in agglutinative languages

Levent Altunyurt,Zihni Orhan,Tunga Gungor

Towards combining rule-based and statistical part of speech tagging in agglutinative languages

2007

Levent Altunyurt
Zihni Orhan
Tunga Gungor

We present a composite part of speech tagger for Turkish which combines the rule- based and statistical approaches. The tagger makes use of word frequencies and n-gram statistics from a corpus. We use the output of a morphological analyzer in order to get more accurate results and also to eliminate the sparse data problem. In addition, we employ a heuristics about the position of words in the sentences. Although the experiments have been performed on a very small corpus, the results have shown that the use of a composite approach and heuristics improves the accuracy of the tagger.

Keywords:

Part-of-speech tagging
Word lists by frequency
Rule-based system
Part of speech
Heuristics
Sparse matrix
Artificial intelligence
Pattern recognition
Computer science
Trigram tagger
Agglutinative language
Speech recognition
Natural language processing
Turkish

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations