STAr : un Système de Segmentation de Textes Arabes basé sur l'analyse contextuelle des signes de ponctuations et de certaines particules
2005
We present in this paper a tokenizer for non-vowelled Arabic texts based on a contextual analysis of the punctuation marks and a list of particles, such as the coordination conjunctions. The input of STAr is an Arabic text (in .txt format) and its output is a segmented text in paragraphs and sentences. The conception of STAr is based on a real corpus of different types of texts and its implementation is done with Perl programming language using the regular expressions.
Keywords:
- Correction
- Cite
- Save
- Machine Reading By IdeaReader
6
References
7
Citations
NaN
KQI