Short Text Clustering with a Deep Multi-embedded Self-supervised Model.

2021 
Short text clustering is challenging in the field of Natural Language Processing (NLP) since it is hard to learn the discriminative representations with limited information. In this paper, fused multi-embedded features are employed to enhance the representations of short texts. Then, a denoising autoencoder with an attention layer is adopted to extract low-dimensional features from the multi-embeddings against the disturbance of noisy texts. Furthermore, we propose a novel distribution estimation with jointly utilizing soft cluster assignment and the prior target distribution transition to better fine-tune the encoder. Combining the above work, we propose a deep multi-embedded self-supervised model(DMESSM) for short text clustering. We compare our DMESSM with the state-of-the-art methods in head-to-head comparisons on benchmark datasets, which indicates that our method outperforms them.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []