Chinese semantic document classification based on strategies of semantic similarity computation and correlation analysis

2020 
Abstract Document classification has become an indispensable technology to realize intelligent information services. This technique is often applied to the tasks such as document organization, analysis, and archiving or implemented as a submodule to support high-level applications. It has been shown that semantic analysis can improve the performance of document classification. Although this has been incorporated in previous automatic document classification methods, with an increase in the number of documents stored online, the use of semantic information for document classification has attracted greater attention as it can greatly reduce human effort. In this present paper, we propose two semantic document classification strategies for two types of semantic problems: (1) a novel semantic similarity computation (SSC) method to solve the polysemy problem and (2) a strong correlation analysis method (SCM) to solve the synonym problem. Experimental results indicate that compared with traditional machine learning, n-gram, and contextualized word embedding methods, the efficient semantic similarity and correlation analysis allow eliminating word ambiguity and extracting useful features to improve the accuracy of semantic document classification for texts in Chinese.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    60
    References
    2
    Citations
    NaN
    KQI
    []