Design and Execution of ETL Process to Build Topic Dimension from User-Generated Content

2021 
Latest research studies on multi-dimensional design have combined business data with User-Generated Content (UGC). They have integrated new analytical aspects, such as user’s behavior, sentiments, opinions or topics of interest, to ameliorate decisional analysis. In this paper, we deal with the complexity of designing topics dimension schema due to the dynamicity and heterogeneity of its hierarchies. Researchers addressed partially this issue by offering technical solutions to topics detection without focusing on the Extraction, Transformation and Loading (ETL) process allowing their integration in multi-dimensional schema. Our contribution consists in modeling ETL steps generating valid topic dimension hierarchies referring to UGC informal texts. In this research work, we propose a generic ETL4SocialTopic process model defining a set of operations executed following a specific order. The implementation of these steps offers a set of customized jobs simplifying the ETL designer’s work by automating a large part of the process. Experimentation results show the consistency of ETL4SocialTopic to design valid topic dimension schemas in several contexts.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []