A Non-Parametric Multi-Lingual Clustering Model for Temporal Short Text

2020 
Short text data is being continuously generated by many social streams such as Facebook and Twitter. Clustering the temporal text, data for the sake of identifying new topics, over huge volume of data has become very challenging task recently. Apart from supervised approaches, most of the existing clustering approaches assume that the input data belong to one language. Whereas, generally it has been observed that multilingual short text on social media exist in bulk amount. In this paper, we propose a model to cluster unknown number of topics in temporal environment for multi-lingual data. The proposed framework integrates non-parametric dirichlet model with language translation component (NDML) to cluster the temporal stream of short text data, and transforms the cluster feature into uniform language vector representation. We conducted experiments on real time crisis data to evaluate the accuracy of our proposed model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    1
    Citations
    NaN
    KQI
    []