An improved algorithm for identifying mathematical formulas in the images of PDF documents

2015 
Mathematical formula identification is an important part of mathematical formula recognition and retrieval. It is more difficult for extracting formulas from the document images in PDF files because of the diversity of their acquisition ways. To solve the problem, this paper designs a method of mathematical formula identification in English PDF document images, which includes three steps: judging columns, extracting mathematical formula character blocks, merging mathematical formula character blocks. Through analyzing and concluding characteristics of the document images in PDF files as well as its effects on mathematical formula identification, this paper designs a related parameter adjustment algorithm for avoiding influences on the performance of mathematical formula identification caused by the resolution variation. The experimental result shows that the adaptability of mathematical formula identification algorithm is improved by some applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    1
    Citations
    NaN
    KQI
    []