A frequent term based text clustering approach using novel similarity measure

G. Suresh Reddy,T. V. Rajinikanth,A. Ananda Rao

A frequent term based text clustering approach using novel similarity measure

2014

G. Suresh Reddy
T. V. Rajinikanth
A. Ananda Rao

Text clustering is an unsupervised process forming its basis solely on finding the similarity relationship between documents with the output as a set of clusters [14]. In this research, a commonality measure is defined to find commonality between two text files which is used as a similarity measure. The main idea is to apply any existing frequent item finding algorithm such as apriori or fp-tree to the initial set of text files to reduce the dimension of the input text files. A document feature vector is formed for all the documents. Then a vector is formed for all the static text input files. The algorithm outputs a set of clusters from the initial input of text files considered.

Keywords:

Fuzzy clustering
Cluster analysis
Feature vector
Correlation clustering
Similarity measure
Canopy clustering algorithm
Machine learning
Document clustering
Artificial intelligence
Pattern recognition
Computer science
Text mining
Unsupervised learning
Data mining

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations