STAr : un Système de Segmentation de Textes Arabes basé sur l'analyse contextuelle des signes de ponctuations et de certaines particules

2005 
We present in this paper a tokenizer for non-vowelled Arabic texts based on a contextual analysis of the punctuation marks and a list of particles, such as the coordination conjunctions. The input of STAr is an Arabic text (in .txt format) and its output is a segmented text in paragraphs and sentences. The conception of STAr is based on a real corpus of different types of texts and its implementation is done with Perl programming language using the regular expressions.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    7
    Citations
    NaN
    KQI
    []