Fine-Grained Language Identification in Scene Text Images

2021 
Identifying the language of the text in scene images is crucial for various applications. Studies that focus on identifying the script, which is a set of letters used for writing in a given language, in scene text images already exist. However, these works do not distinguish between different languages written in the same script and are thus unable to meet the needs of many applications. To address this challenge, we study a novel task: fine-grained language identification in scene text images, which aims to distinguish languages that share the same script. The datasets that include samples in seven languages, which are Dutch, English, French, Italian, German, Spanish, and Portuguese, are constructed. Furthermore, well-designed end-to-end trainable neural networks are proposed for fine-grained language identification, where semantic information concerning the text is mined and utilized to assist the language identification. We train the networks on the synthetic dataset and evaluate them with the collected real dataset. The experimental results demonstrate that the proposed frameworks are effective.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    38
    References
    0
    Citations
    NaN
    KQI
    []