Clustering Categorical Sequences with Variable-Length Tuples Representation

Liang Yuan,Zhiling Hong,Lifei Chen,Qiang Cai

Clustering Categorical Sequences with Variable-Length Tuples Representation

2016

Clustering categorical sequences is currently a difficult problem due to the lack of an efficient representation model for sequences. Unlike the existing models, which mainly focus on the fixed-length tuples representation, in this paper, a new representation model on the variable-length tuples is proposed. The variable-length tuples are obtained using a pruning method applied to delete the redundant tuples from the suffix tree, which is created for the fixed-length tuples with a large memory-length of sequences, in terms of the entropy-based measure evaluating the redundancy of tuples. A partitioning algorithm for clustering categorical sequences is then defined based on the normalized representation using tuples collected from the pruned tree. Experimental studies on six real-world sequence sets show the effectiveness and suitability of the proposed method for subsequence-based clustering.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations