An Overview on Supervised Semi-structured Data Classification

2021 
Many collaboratively building resources, such as Wikipedia, Weibo and Quora, exist in the form of semi-structured data. The semi-structured data has been widely used in areas such as data integration, data distribution, data storage, data management, information retrieval and knowledge management. For large volumes of semi-structured data on the Web, semi-structured data classification technique can group them into different categories by their structure and/or content information. Supervised semi-structured data classification plays an important role in many applications. This paper provides an overview of the literature in the area of supervised semi-structured data classification. A general framework for semi-structured data classification is presented, which is mainly composed of two steps: feature extraction and model building. Several different representation models of semi-structured data are discussed, mainly including rooted labeled tree model, feature vector space model and feature set model. A large selection of semi-structured data classification approaches are reviewed in detail from two aspects: based on structure only and based on both structure and content. Finally, several future research directions for semistructured data classification are presented.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    68
    References
    0
    Citations
    NaN
    KQI
    []