PatchIndex - Exploiting Approximate Constraints in Self-managing Databases

2020 
In the cloud environment, data warehouse solutions need to be self-managing in order to be usable without prior database administration knowledge. Additionally, data is typically not clean in these environments, as it is imported from various sources. As a consequence, automatic schema optimization as an important task of self-management becomes difficult without human interaction and data cleaning steps. Within this paper, we focus on constraint discovery as a subtask of schema optimization. Real world datasets with unclean data may not contain perfect constraints, as a minor part of the values hampers the definition of them. Therefore, we introduce the PatchIndex structure, which handles these exceptions to column constraints and enables self-management tools to discover and define approximate constraints on unclean data. We present “nearly unique column” and nearly sorted column” constraints, both managed by the generic PatchIndex structure. Furthermore, we provide mechanisms to discover these constraints and show how query performance can benefit from them for different use cases by integrating them into query optimization. Our evaluation shows that the PatchIndex structure offers opportunities for a significant performance boost in different use cases while enabling self-management tools to define constraints on unclean data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    4
    Citations
    NaN
    KQI
    []