Character classification and recognition for Urdu texts in natural scene images

2018 
In this research work, classification and recognition methods are presented and compared for Urdu characters in natural scenes. Urdu text is a type of cursive text, and the detection or recognition of cursive text in natural scenes is a more complex and challenging task than for non-cursive text. Character classification is a fundamental step in the process of automatic text extraction from natural scenes. Accurate detection and recognition of characters for an end-to-end system heavily depends upon the efficiency of the character classifier. Furthermore, character classification of cursive languages is difficult due to the problems of complexity and character diversity. We propose a framework with image processing operations and a feature extraction technique to handle the problem of Urdu character recognition in natural scene images. In this paper, features are extracted using the Histogram of Oriented Gradient (HOG) method and are fed into five classifiers: Support Vector Machine (SVM), k Nearest Neighbors (kNN), Random Forest Classifier (RFC), Extra Tree Classifier (ETC) and Multi-Layer Perceptron (MLP). To evaluate the proposed system, as no dataset for Urdu natural scene text exists, we manually segmented characters from images and have developed a dataset of 18000 cropped Urdu characters. Experimental results show the efficiency of each classification method for segmented Urdu characters in natural scene images.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    12
    Citations
    NaN
    KQI
    []