Document Image Retrieval with Local Feature Sequences

2009 
In recent years, many document image retrieval algorithms have been proposed. However, most of the current approaches either need good quality images or depend on the page layout structure. This paper presents a fast, accurate and OCR-free image retrieval algorithm using local feature sequences which can describe the intrinsic, unique and page-layout-free characteristics of document images. With a simple preprocessing step, the local feature sequences can be extracted without print-core detection and image registration. Then an efficient coarse-to-fine common substring matching strategy is applied to do local feature sequences matching. Beyond a single matching score, this approach can locate the matched parts word by word. It well handles the challenges including low resolution, different language, rotation and incompleteness and N-up. The encouraging experiment results on a large scale document image database show the retrieval outputs are sufficient good to be used directly as document image identification results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    16
    Citations
    NaN
    KQI
    []