Research on XML Element Search Results Clustering

2012 
Clustering XML search results is an effective way to improve performance. However, the key problem is how to measure similarity between XML documents. This paper studies XML search results clustering based on element granularity and proposes one similarity measurement method. The method firstly uses latent semantic indexing technology(LSI) to obtain term semantics and then combines the XML element node content and semantic structure properties(CASS). To evaluate clustering performance, two new performance evaluation methodologies, namely R_ClusterRatio and R_DocuRatio are introduced. It is motivated by the observations of relevant documents distribution and the fact that the experiment data collection, IEEE CS corpus, do not provide classification information. Experiment results show that proposed similarity method combining term semantics with content and structure semantics integration(LSI-CASS) is feasible, and it produces better clustering quality than LSI-CAS.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    1
    Citations
    NaN
    KQI
    []