A novel incremental conceptual hierarchical text clustering method using CFu-tree
2015
This paper presents a novel down-top incremental conceptual hierarchical text clustering approach using CFu-tree (ICHTC-CF) representation.For summarizing a cluster, we use the term-based feature extraction in text clustering.A new measure criterion, Comparison Variation (CV), is presented for judging whether the clusters can be merged or split.The incremental clustering method is not sensitive to the input data order.Experimental results show that the performance of our method outperforms k-means, which indicate our new technique is efficient and feasible. As a data mining method, clustering, which is one of the most important tools in information retrieval, organizes data based on unsupervised learning which means that it does not require any training data. But, some text clustering algorithms cannot update existing clusters incrementally and, instead, have to recompute a new clustering from scratch. In view of above, this paper presents a novel down-top incremental conceptual hierarchical text clustering approach using CFu-tree (ICHTC-CF) representation, which starts with each item as a separate cluster. Term-based feature extraction is used for summarizing a cluster in the process. The Comparison Variation measure criterion is also adopted for judging whether the closest pair of clusters can be merged or a previous cluster can be split. And, our incremental clustering method is not sensitive to the input data order. Experimental results show that the performance of our method outperforms k-means, CLIQUE, single linkage clustering and complete linkage clustering, which indicate our new technique is efficient and feasible.
Keywords:
- Artificial intelligence
- Single-linkage clustering
- Machine learning
- k-medians clustering
- Complete-linkage clustering
- Cluster analysis
- Correlation clustering
- CURE data clustering algorithm
- Canopy clustering algorithm
- Brown clustering
- Computer science
- Pattern recognition
- Fuzzy clustering
- Data mining
- Hierarchical clustering
- Conceptual clustering
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
38
References
12
Citations
NaN
KQI