Design of a Font Size Independent OCR System for Urdu Nastaleeq Based Printed Text

2021 
Urdu Nastaleeq text font is considered as a standard composing font for Urdu as well as Arabic script. It is widely used in Sub-continent and Middle East region in the form of printed media, books and old historical scrolls. However, the electronic availability of useful knowledge written in Urdu is not fully available due to the lack of techniques available for digitization of old handwritten or printed scripts. We develop a font size independent optical character recognition (OCR) system to recognize Nastaleeq based Urdu written script. In the first step, preprocessing is performed using binarization, median filtering and thinning of the scanned image. In second, step line and ligature segmentation are performed. The classification or recognition of Urdu font is achieved using Back Propagation Multilayer Perceptron Neural Network (BP-MLP-NN). Where multiple features such as raw, central and scale-invariant movement along with area, centroid and orientation are used to train and recognize the character. The designed prototype successfully detected individual Urdu characters with 98% accuracy on a self-generated database and 96% accuracy on scanned textbook data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []