Signature-Based Retrieval of Scanned Documents Using Conditional Random Fields

2009 
In searching a large repository of scanned documents, a task of interest is that of retrieving documents from a database using a signature image as a query. This chapter presents a signature retrieval strategy using document indexing and retrieval. Indexing is done using (i) a model based on Conditional Random Fields (CRF) to label extracted segments of scanned documents as Machine-Print, Signature and Noise, (ii) a technique using support vector machine to remove noise and printed text overlapping the signature images and (iii) a global shape-based feature extractor that is computed for each signature image. The documents are first segmented into patches using a region growing algorithm and the CRF based model is used to infer the labels of each of these patches. The robustness of the method is due to the inherent nature of modeling neighboring spatial dependencies in the labels as well as the observed data using CRF. The model parameters are learnt using conjugate gradient descent with line search optimization to maximize pseudo-likelihood estimates and the inference of labels is done by computing the probability of the labels under the model with Gibbs sampling. A further post processing of the labeled patches yields signature regions which are used to index the documents. Retrieval is performed using a matching algorithm to compare the query with the indexed documents. Signature matching is based on a normalized correlation similarity measure using global shape-based binary feature vectors. The end-to-end system is a content-based image retrieval system designed for signatures.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    6
    Citations
    NaN
    KQI
    []