Word Web Cluster on Sparse Data of Social Network Based on Thematic Tree

2017 
Theme clustering is classical and pivotal analysis in monitor system of public opinion in social network. The main procedure is to extract the set of keyword of textual content and build vector spatial model to calculate the similarity of different data. However, the outcome is not positive using simple text process because the learning knowledge derives from the text document itself without any semantic features and the textual feature is sparse in short text document. Prior methods using language model in natural language process have been proposed to calculate the similarity and the result proves to be better than the vector model. We extend the language model in clustering and leverage the characteristic of social network to cluster post data. Our method is designed to be incremental considering online data arrive continuously in social network. We cast the thematic tree retrieving more knowledge of word terms via neural network language model and use topic hierarchy calculating similarity of short text. We measure the cluster quality which shows great improvement with thematic tree.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []