A Text Line Extraction Method for Archival Document Transcription

2020 
In order to reinforce the enrichment and exploitation of archival collections, a growing need for computer-aided tools able to assist researchers, historians and archivists in historical document image transcription has been recently highlighted. However, to ensure an efficient text transcription from archival handwritten and printed document images, a robust text line segmentation task is required. Thus, in this paper we propose a method able to extract whole text lines from archival document images. The proposed method is firstly based on our previous work reported at ICDAR 2019, which focused on extracting only the main area covering the text core. A post-processing step is introduced in this paper to extract whole text lines (including the ascender and descender components). The post-processing step is based on topological structural analysis of binary images. To illustrate the effectiveness of the proposed method, we have conducted experiments on archival document images collected from the Tunisian national archives. Qualitative and quantitative results are reported and discussed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    2
    Citations
    NaN
    KQI
    []