Multi-task Learning for Newspaper Image Segmentation and Baseline Detection Using Attention-Based U-Net Architecture.

2021 
In this work, we propose an end-to-end language agnostic multi-task learning based U-Net framework for performing text block segmentation and baseline detection in document images. We leverage the performance of U-Net by augmenting attention layers between the contracting and expansive path via skip connections. The generalization ability of the model is validated on handwritten images as well. We perform exhaustive experiments on ICPR2020 challenge dataset and obtain a test accuracy of 96.09% and 99.44% for simple track baseline detection and text block segmentation respectively, 97.47% and 98.51% complex track baseline and text block segmentation respectively. The source code is made publicly available at https://github.com/divyanshjoshi/Attention-U-Net-Newspaper-Text-Block-Segmentation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []