A Proposed Ranked Clustering Approach for Unstructured Data from Dataspace using VSM

2020 
Now a day's huge amount of data is available in an unstructured format, users need useful information related to query or phrase that has been written in search engines. Search engine rank and indexed the data as per the nature of data and documents like structure (SQL data), unstructured (e-books, PPT, text, Streamed Data, songs, movies, research data), or semi-structured (XML). Indexing and ranking is the main issue in Information retrieval system to retrieve the appropriate results of the query from Dataspace due to heterogeneity. Indexing of unstructured data can reduce the processing time of query for fast retrieval of data. This paper proposed a ranked cluster approach using Modified cosine similarity and Vector space model (VSM) which may be replaced with the traditional cosine similarity approach for better results on the dataset. Here we applying the vector space model, Document term matrix, and TF-IDF weights for indexing and ranking of the heterogeneous data. Consequently, the documents which match the query the most are displayed first and ranking of the documents is done according to similarity with the query of unstructured data over Dataspace.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    3
    Citations
    NaN
    KQI
    []