Model-based Clustering Of Short Text Streams

Authors:
Jianhua Yin School of Computer Science and Technology, Shandong University
Daren Chao School of Computer Science and Technology, Shandong University
Zhongkun Liu School of Computer Science and Technology, Shandong University
Wei Zhang Shanghai Key Laboratory of Trustworthy Computing, East China Normal University
Xiaohui Yu School of Computer Science and Technology, Shandong University
Jianyong Wang Tsinghua University

Introduction:

This paper studies Short text stream clustering. In this paper, the authors propose a model-based short text stream clustering algorithm (MStream) which can deal with the concept drift problem and sparsity problem naturally.

Abstract:

Short text stream clustering has become an increasingly important problem due to the explosive growth of short text in diverse social medias. In this paper, we propose a model-based short text stream clustering algorithm (MStream) which can deal with the concept drift problem and sparsity problem naturally. The MStream algorithm can achieve state-of-the-art performance with only one pass of the stream, and can have even better performance when we allow multiple iterations of each batch. We further propose an improved algorithm of MStream with forgetting rules called MStreamF, which can efficiently delete outdated documents by deleting clusters of outdated batches. Our extensive experimental study shows that MStream and MStreamF can achieve better performance than three baselines on several real datasets.

You may want to know: